Dataset Information

A combined strategy of feature selection and machine learning to identify predictors of prediabetes.

ABSTRACT:

Objective

To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population.

Materials and methods

We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013-2014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and resampled training datasets built using 4 resampling methods. Predictive models were tested on internal validation data (n = 3172) and external validation data (n = 3000) prepared from National Health and Nutrition Examination Survey 2011-2012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance.

Results

Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (≥ 70% AUROC) models identified 25 predictors including 4 potentially novel associations; 20 by both logistic and other nonlinear/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P < 0.05).

Discussion

Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified.

Conclusion

This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.

SUBMITTER: De Silva K

PROVIDER: S-EPMC7647289 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A combined strategy of feature selection and machine learning to identify predictors of prediabetes.

De Silva Kushan K Jönsson Daniel D Demmer Ryan T RT

Journal of the American Medical Informatics Association : JAMIA 20200301 3

<h4>Objective</h4>To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population.<h4>Materials and methods</h4>We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013-2014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithm ...[more]

PMID: 31889178

Dataset Information

A combined strategy of feature selection and machine learning to identify predictors of prediabetes.

Objective

Materials and methods

Results

Discussion

Conclusion

Publications

A combined strategy of feature selection and machine learning to identify predictors of prediabetes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

FeatureSelect: a software for feature selection based on machine learning approaches.
| S-EPMC6446290 | biostudies-literature

Antiprotozoal peptide prediction using machine learning with effective feature selection techniques.
| S-EPMC11380031 | biostudies-literature

Practical feature filter strategy to machine learning for small datasets in chemistry.
| S-EPMC11379859 | biostudies-literature

Assessment of Alzheimer-related pathologies of dementia using machine learning feature selection.
| S-EPMC9999590 | biostudies-literature

Machine-learning based feature selection for a non-invasive breathing change detection.
| S-EPMC8286592 | biostudies-literature

Feature selection and association rule learning identify risk factors of malnutrition among Ethiopian schoolchildren.
| S-EPMC10910994 | biostudies-literature

Drug Repurposing in Glioblastoma Using a Machine Learning-Based Hybrid Feature Selection Approach.
| S-EPMC12732686 | biostudies-literature

Enhanced SQL injection detection using chi-square feature selection and machine learning classifiers.
| S-EPMC12672241 | biostudies-literature

Integrating Feature Selection, Machine Learning, and SHAP Explainability to Predict Severe Acute Pancreatitis.
| S-EPMC12523390 | biostudies-literature

Identifying Optimal Machine Learning Approaches for Microbiome-Metabolomics Integration with Stable Feature Selection.
| S-EPMC12236860 | biostudies-literature