Unknown

Dataset Information

0

DeepMicro: deep representation learning for disease prediction based on microbiome data.


ABSTRACT: Human microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X-30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.

SUBMITTER: Oh M 

PROVIDER: S-EPMC7138789 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

DeepMicro: deep representation learning for disease prediction based on microbiome data.

Oh Min M   Zhang Liqing L  

Scientific reports 20200407 1


Human microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning appli  ...[more]

Similar Datasets

2021-06-22 | GSE175456 | GEO
| S-EPMC7085143 | biostudies-literature
| S-EPMC9343202 | biostudies-literature
2023-03-31 | GSE165175 | GEO
| S-EPMC8271634 | biostudies-literature
| S-EPMC8636933 | biostudies-literature
| S-EPMC8466762 | biostudies-literature
2023-03-31 | GSE165174 | GEO
2023-03-31 | GSE165173 | GEO
2023-03-31 | GSE165171 | GEO