Dataset Information

Predicting youth diabetes risk using NHANES data and machine learning.

ABSTRACT: Prediabetes and diabetes mellitus (preDM/DM) have become alarmingly prevalent among youth in recent years. However, simple questionnaire-based screening tools to reliably assess diabetes risk are only available for adults, not youth. As a first step in developing such a tool, we used a large-scale dataset from the National Health and Nutritional Examination Survey (NHANES) to examine the performance of a published pediatric clinical screening guideline in identifying youth with preDM/DM based on American Diabetes Association diagnostic biomarkers. We assessed the agreement between the clinical guideline and biomarker criteria using established evaluation measures (sensitivity, specificity, positive/negative predictive value, F-measure for the positive/negative preDM/DM classes, and Kappa). We also compared the performance of the guideline to those of machine learning (ML) based preDM/DM classifiers derived from the NHANES dataset. Approximately 29% of the 2858 youth in our study population had preDM/DM based on biomarker criteria. The clinical guideline had a sensitivity of 43.1% and specificity of 67.6%, positive/negative predictive values of 35.2%/74.5%, positive/negative F-measures of 38.8%/70.9%, and Kappa of 0.1 (95%CI: 0.06-0.14). The performance of the guideline varied across demographic subgroups. Some ML-based classifiers performed comparably to or better than the screening guideline, especially in identifying preDM/DM youth (p = 5.23 × 10^-5).We demonstrated that a recommended pediatric clinical screening guideline did not perform well in identifying preDM/DM status among youth. Additional work is needed to develop a simple yet accurate screener for youth diabetes risk, potentially by using advanced ML methods and a wider range of clinical and behavioral health data.

SUBMITTER: Vangeepuram N

PROVIDER: S-EPMC8160335 | biostudies-literature | 2021 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Predicting youth diabetes risk using NHANES data and machine learning.

Vangeepuram Nita N Liu Bian B Chiu Po-Hsiang PH Wang Linhua L Pandey Gaurav G

Scientific reports 20210527 1

Prediabetes and diabetes mellitus (preDM/DM) have become alarmingly prevalent among youth in recent years. However, simple questionnaire-based screening tools to reliably assess diabetes risk are only available for adults, not youth. As a first step in developing such a tool, we used a large-scale dataset from the National Health and Nutritional Examination Survey (NHANES) to examine the performance of a published pediatric clinical screening guideline in identifying youth with preDM/DM based on ...[more]

PMID: 34045491

Similar Datasets

Project description:BackgroundMaternal morbidity and mortality remain critical health concerns globally. As a result, reducing the maternal mortality ratio (MMR) is part of goal 3 in the global sustainable development goals (SDGs), and previously, it was an important indicator in the Millennium Development Goals (MDGs). Therefore, identifying high-risk groups during pregnancy is crucial for decision-makers and medical practitioners to mitigate mortality and morbidity. However, the availability of accurate predictive models for maternal mortality and maternal health risks is challenging. Compared with traditional predictive models, machine learning algorithms have emerged as promising predictive modelling methods providing accurate predictive models.MethodsThis work aims to explore the potential of machine learning (ML) algorithms in maternal risk level prediction using a nationwide maternal mortality dataset from Oman for the first time. A total of 402 maternal deaths from 1991 to 2023 in Oman were included in this study. We utilised principal component analysis (PCA) in the ML algorithms and compared them to the results of model performance without PCA. We employed and compared ten ML algorithms, including decision tree (DT), random forest (RF), K-Nearest Neighbors (KNN), Naïve Bayes (NB), Extreme Gradient Boosting (xgboost), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Logistic Regression (LR), Support Vector Machine (SVM) and Artificial Neural Network (ANN). Different metrics, including, accuracy, sensitivity, precision, and the F1- score, were utilised to assess Model performance.ResultsThe results indicated that the RF model outperformed the other methods in predicting the risk level (low or high) with an accuracy of 75.2%, precision of 85.7% and F1- score of 73% after PCA was applied.ConclusionsWe applied several machine learning models to predict maternal risk levels for the first time using real data from Oman. RF outperformed the other algorithms in this classification problem. A reliable estimate of maternal risk level would facilitate intervention plans for medical practitioners to reduce maternal death.

Dataset Information

Predicting youth diabetes risk using NHANES data and machine learning.

Publications

Predicting youth diabetes risk using NHANES data and machine learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets