Unknown

Dataset Information

0

Data-driven human transcriptomic modules determined by independent component analysis.


ABSTRACT: BACKGROUND:Analyzing the human transcriptome is crucial in advancing precision medicine, and the plethora of over half a million human microarray samples in the Gene Expression Omnibus (GEO) has enabled us to better characterize biological processes at the molecular level. However, transcriptomic analysis is challenging because the data is inherently noisy and high-dimensional. Gene set analysis is currently widely used to alleviate the issue of high dimensionality, but the user-defined choice of gene sets can introduce biasness in results. In this paper, we advocate the use of a fixed set of transcriptomic modules for such analysis. We apply independent component analysis to the large collection of microarray data in GEO in order to discover reproducible transcriptomic modules that can be used as features for machine learning. We evaluate the usability of these modules across six studies, and demonstrate (1) their usage as features for sample classification, and also their robustness in dealing with small training sets, (2) their regularization of data when clustering samples and (3) the biological relevancy of differentially expressed features. RESULTS:We identified 139 reproducible transcriptomic modules, which we term fundamental components (FCs). In studies with less than 50 samples, FC-space classification model outperformed their gene-space counterparts, with higher sensitivity (p 

SUBMITTER: Zhou W 

PROVIDER: S-EPMC6142401 | biostudies-literature | 2018 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Data-driven human transcriptomic modules determined by independent component analysis.

Zhou Weizhuang W   Altman Russ B RB  

BMC bioinformatics 20180917 1


<h4>Background</h4>Analyzing the human transcriptome is crucial in advancing precision medicine, and the plethora of over half a million human microarray samples in the Gene Expression Omnibus (GEO) has enabled us to better characterize biological processes at the molecular level. However, transcriptomic analysis is challenging because the data is inherently noisy and high-dimensional. Gene set analysis is currently widely used to alleviate the issue of high dimensionality, but the user-defined  ...[more]

Similar Datasets

| S-EPMC2991480 | biostudies-literature
| S-EPMC8653613 | biostudies-literature
| S-EPMC3224102 | biostudies-literature
| S-EPMC4005775 | biostudies-literature
| S-EPMC2646728 | biostudies-literature
| S-EPMC5207692 | biostudies-literature
| S-EPMC6867364 | biostudies-literature
| S-EPMC5513449 | biostudies-other
| S-EPMC5594474 | biostudies-literature
| S-EPMC5695755 | biostudies-literature