Dataset Information

BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data.

ABSTRACT: Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2, a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html).

SUBMITTER: Zhang S

PROVIDER: S-EPMC11316398 | biostudies-literature | 2024 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data.

Zhang Shunjie S Li Pan P Wang Shenghan S Zhu Jijun J Huang Zhongting Z Cai Fuqiang F Freidel Sebastian S Ling Fei F Schwarz Emanuel E Chen Junfang J

Briefings in bioinformatics 20240701 5

Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we int ...[more]

PMID: 39126426

Dataset Information

BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data.

Publications

BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Phenotype prediction using biologically interpretable neural networks on multi-cohort multi-omics data.
| S-EPMC11297229 | biostudies-literature

Benchmarking ensemble machine learning algorithms for multi-class, multi-omics data integration in clinical outcome prediction.
| S-EPMC11926982 | biostudies-literature

DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data.
| S-EPMC8281595 | biostudies-literature

A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data.
| S-EPMC4315157 | biostudies-literature

Multi-omics assists genomic prediction of maize yield with machine learning approaches.
| S-EPMC10853138 | biostudies-literature

Machine learning and deep learning methods that use omics data for metastasis prediction.
| S-EPMC8450182 | biostudies-literature

G2PDeep-v2: a web-based deep-learning framework for phenotype prediction and biomarker discovery using multi-omics data.
| S-EPMC11418982 | biostudies-literature

Multi-omics analyses and machine learning prediction of oviductal responses in the presence of gametes and embryos
2024-08-23 | GSE270654 | GEO

Dealing with dimensionality: the application of machine learning to multi-omics data.
| S-EPMC9907220 | biostudies-literature

GLIMS: A two-stage gradual-learning method for cancer genes prediction using multi-omics data and co-splicing network.
| S-EPMC10951990 | biostudies-literature