Dataset Information

Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection.

ABSTRACT: Many variable selection techniques have been proposed for the clustering of gene expression data. While these methods tend to filter out irrelevant genes and identify informative genes that contribute to a clustering solution, they are based on criteria that do not consider the potential interactive influence among individual genes. Motivated by ensemble clustering, there is a strong interest in leveraging the structure of gene networks for gene selection, so that the relationship information between genes can be effectively utilized, while the selected genes are expected to preserve all the possible clustering structures in the data.We present a new filter method that uses the gene connectivity in the gene co-expression network as the evaluation criteria for variable selection. The gene connectivity measures the importance of the genes in term of their expression similarity with others in the co-expression network. The hard threshold and soft threshold transformations are employed to construct the gene co-expression networks. Both simulation studies and real data analysis have shown that the network based on soft thresholding is more effective in selecting relevant variables and provides better clustering results compared to the hard thresholding transformation and two other canonical filter methods for variable selection. Furthermore, a new module analysis approach is proposed to reveal the higher order organization of the gene space, where the genes of a module share significant topological similarity and are associated with a consensus partition of the sample space. We demonstrate that the identified modules can lead to biologically meaningful sample partitions that might be missed by other methods.By leveraging the structure of gene co-expression network, first we propose a variable selection method that selects individual genes with top connectivity. Both simulation studies and real data application have demonstrated that our method has better performance in terms of the reliability of the selected genes and sample clustering results. In addition, we propose a module recovery method that can help discover novel sample partitions that might be hidden when performing clustering analyses using all available genes. The source code of our program is available at http://nba.uth.tmc.edu/homepage/liu/netVar/.

SUBMITTER: Wang Z

PROVIDER: S-EPMC4035826 | biostudies-literature | 2014 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection.

Wang Zixing Z San Lucas F Anthony FA Qiu Peng P Liu Yin Y

BMC bioinformatics 20140520

<h4>Background</h4>Many variable selection techniques have been proposed for the clustering of gene expression data. While these methods tend to filter out irrelevant genes and identify informative genes that contribute to a clustering solution, they are based on criteria that do not consider the potential interactive influence among individual genes. Motivated by ensemble clustering, there is a strong interest in leveraging the structure of gene networks for gene selection, so that the relation ...[more]

PMID: 24885641

Dataset Information

Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection.

Publications

Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Leveraging pleiotropic association using sparse group variable selection in genomics data.
| S-EPMC8742466 | biostudies-literature

Clustering and variable selection in the presence of mixed variable types and missing data.
| S-EPMC6240391 | biostudies-literature

Unsupervised gene selection using biological knowledge : application in sample clustering.
| S-EPMC5700545 | biostudies-literature

Comprehensive Characterization of Multitissue Expression Landscape, Co-Expression Networks and Positive Selection in Pikeperch.
| S-EPMC8471114 | biostudies-literature

An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks.
| S-EPMC5389000 | biostudies-literature

K-Module Algorithm: An Additional Step to Improve the Clustering Results of WGCNA Co-Expression Networks.
| S-EPMC7828115 | biostudies-literature

Variable selection in omics data: A practical evaluation of small sample sizes.
| S-EPMC6013185 | biostudies-literature

Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data.
| S-EPMC6805321 | biostudies-literature

Variable Ordering Selection for Cylindrical Algebraic Decomposition with Artificial Neural Networks
| S-EPMC7340889 | biostudies-literature

Improving probe set selection for microbial community analysis by leveraging taxonomic information of training sequences.
| S-EPMC3224148 | biostudies-literature