Unknown

Dataset Information

0

Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study.


ABSTRACT: MOTIVATION:High-throughput technologies allow comprehensive characterization of individuals on many molecular levels. However, training computational models to predict disease status based on omics data is challenging. A promising solution is the integration of external knowledge about structural and functional relationships into the modeling process. We compared four published random forest-based approaches using two simulation studies and nine experimental datasets. RESULTS:The self-sufficient prediction error approach should be applied when large numbers of relevant pathways are expected. The competing methods hunting and learner of functional enrichment should be used when low numbers of relevant pathways are expected or the most strongly associated pathways are of interest. The hybrid approach synthetic features is not recommended because of its high false discovery rate. AVAILABILITY AND IMPLEMENTATION:An R package providing functions for data analysis and simulation is available at GitHub (https://github.com/szymczak-lab/PathwayGuidedRF). An accompanying R data package (https://github.com/szymczak-lab/DataPathwayGuidedRF) stores the processed and quality controlled experimental datasets downloaded from Gene Expression Omnibus (GEO). SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Seifert S 

PROVIDER: S-EPMC7520048 | biostudies-literature | 2020 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study.

Seifert Stephan S   Gundlach Sven S   Junge Olaf O   Szymczak Silke S  

Bioinformatics (Oxford, England) 20200801 15


<h4>Motivation</h4>High-throughput technologies allow comprehensive characterization of individuals on many molecular levels. However, training computational models to predict disease status based on omics data is challenging. A promising solution is the integration of external knowledge about structural and functional relationships into the modeling process. We compared four published random forest-based approaches using two simulation studies and nine experimental datasets.<h4>Results</h4>The  ...[more]

Similar Datasets

| S-EPMC3750505 | biostudies-literature
| S-EPMC2335306 | biostudies-literature
| S-EPMC4724236 | biostudies-literature
| S-EPMC7794504 | biostudies-literature
| S-EPMC3530909 | biostudies-literature
| S-EPMC2804301 | biostudies-literature
| S-EPMC2813864 | biostudies-literature
| S-EPMC3591263 | biostudies-literature
| S-EPMC6686255 | biostudies-literature
| S-EPMC4331277 | biostudies-literature