Unknown

Dataset Information

0

Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning.


ABSTRACT:

Motivation

Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES).

Results

We analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes.

Availability and implementation

TPOT is freely available via http://epistasislab.github.io/tpot/.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Orlenko A 

PROVIDER: S-EPMC7703753 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning.

Orlenko Alena A   Kofink Daniel D   Lyytikäinen Leo-Pekka LP   Nikus Kjell K   Mishra Pashupati P   Kuukasjärvi Pekka P   Karhunen Pekka J PJ   Kähönen Mika M   Laurikka Jari O JO   Lehtimäki Terho T   Asselbergs Folkert W FW   Moore Jason H JH  

Bioinformatics (Oxford, England) 20200301 6


<h4>Motivation</h4>Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discove  ...[more]

Similar Datasets

| S-EPMC9204796 | biostudies-literature
| S-EPMC9291719 | biostudies-literature
| S-EPMC8023531 | biostudies-literature
| S-EPMC6405668 | biostudies-other
| S-EPMC9328568 | biostudies-literature
| S-EPMC9691305 | biostudies-literature
| S-EPMC7537993 | biostudies-literature
| S-EPMC8229983 | biostudies-literature
| S-EPMC6811630 | biostudies-literature