Dataset Information

ParaGSEA: a scalable approach for large-scale gene expression profiling.

ABSTRACT: More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA.

SUBMITTER: Peng S

PROVIDER: S-EPMC5737394 | biostudies-literature | 2017 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

paraGSEA: a scalable approach for large-scale gene expression profiling.

Peng Shaoliang S Yang Shunyun S Bo Xiaochen X Li Fei F

Nucleic acids research 20170901 17

More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale ...[more]

PMID: 28973463

Similar Datasets

Project description:Metaproteomics is used to explore the functional dynamics of microbial communities. However, acquiring metaproteomic data by tandem mass spectrometry (MS/MS) is time-consuming and resource-intensive, and there is a demand for computational methods that can be used to reduce these resource requirements. We present MetaProClust-MS1, a computational framework for microbiome feature screening developed to prioritize samples for follow-up MS/MS. In this proof-of-concept study, we tested and compared MetaProClust-MS1 results on gut microbiome data, from fecal samples, acquired using short 15-min MS1-only chromatographic gradients and MS1 spectra from longer 60-min gradients to MS/MS-acquired data. We found that MetaProClust-MS1 identified robust gut microbiome responses caused by xenobiotics with significantly correlated cluster topologies of comparable data sets. We also used MetaProClust-MS1 to reanalyze data from both a clinical MS/MS diagnostic study of pediatric patients with inflammatory bowel disease and an experiment evaluating the therapeutic effects of a small molecule on the brain tissue of Alzheimer's disease mouse models. MetaProClust-MS1 clusters could distinguish between inflammatory bowel disease diagnoses (ulcerative colitis and Crohn's disease) using samples from mucosal luminal interface samples and identified hippocampal proteome shifts of Alzheimer's disease mouse models after small-molecule treatment. Therefore, we demonstrate that MetaProClust-MS1 can screen both microbiomes and single-species proteomes using only MS1 profiles, and our results suggest that this approach may be generalizable to any proteomics experiment. MetaProClust-MS1 may be especially useful for large-scale metaproteomic screening for the prioritization of samples for further metaproteomic characterization, using MS/MS, for instance, in addition to being a promising novel approach for clinical diagnostic screening. IMPORTANCE Growing evidence suggests that human gut microbiome composition and function are highly associated with health and disease. As such, high-throughput metaproteomic studies are becoming more common in gut microbiome research. However, using a conventional long liquid chromatography (LC)-MS/MS gradient metaproteomics approach as an initial screen in large-scale microbiome experiments can be slow and expensive. To combat this challenge, we introduce MetaProClust-MS1, a computational framework for microbiome screening using MS1-only profiles. In this proof-of-concept study, we show that MetaProClust-MS1 identifies clusters of gut microbiome treatments using MS1-only profiles similar to those identified using MS/MS. Our approach allows researchers to prioritize samples and treatments of interest for further metaproteomic analyses and may be generally applicable to any proteomic analysis. In particular, this approach may be especially useful for large-scale metaproteomic screening or in clinical settings where rapid diagnostic evidence is required.

Dataset Information

ParaGSEA: a scalable approach for large-scale gene expression profiling.

Publications

paraGSEA: a scalable approach for large-scale gene expression profiling.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets