Dataset Information

An interpretable machine learning model of cross-sectional U.S. county-level obesity prevalence using explainable artificial intelligence.

ABSTRACT:

Background

There is considerable geographic heterogeneity in obesity prevalence across counties in the United States. Machine learning algorithms accurately predict geographic variation in obesity prevalence, but the models are often uninterpretable and viewed as a black-box.

Objective

The goal of this study is to extract knowledge from machine learning models for county-level variation in obesity prevalence.

Methods

This study shows the application of explainable artificial intelligence methods to machine learning models of cross-sectional obesity prevalence data collected from 3,142 counties in the United States. County-level features from 7 broad categories: health outcomes, health behaviors, clinical care, social and economic factors, physical environment, demographics, and severe housing conditions. Explainable methods applied to random forest prediction models include feature importance, accumulated local effects, global surrogate decision tree, and local interpretable model-agnostic explanations.

Results

The results show that machine learning models explained 79% of the variance in obesity prevalence, with physical inactivity, diabetes, and smoking prevalence being the most important factors in predicting obesity prevalence.

Conclusions

Interpretable machine learning models of health behaviors and outcomes provide substantial insight into obesity prevalence variation across counties in the United States.

SUBMITTER: Allen B

PROVIDER: S-EPMC10553328 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An interpretable machine learning model of cross-sectional U.S. county-level obesity prevalence using explainable artificial intelligence.

Allen Ben B

PloS one 20231005 10

<h4>Background</h4>There is considerable geographic heterogeneity in obesity prevalence across counties in the United States. Machine learning algorithms accurately predict geographic variation in obesity prevalence, but the models are often uninterpretable and viewed as a black-box.<h4>Objective</h4>The goal of this study is to extract knowledge from machine learning models for county-level variation in obesity prevalence.<h4>Methods</h4>This study shows the application of explainable artificia ...[more]

PMID: 37796874

Similar Datasets

Project description:Failure to predict stroke promptly may lead to delayed treatment, causing severe consequences like permanent neurological damage or death. Early detection using deep learning (DL) and machine learning (ML) models can enhance patient outcomes and mitigate the long-term effects of strokes. The aim of this study is to compare these models, exploring their efficacy in predicting stroke. This study analyzed a dataset comprising 663 records from patients hospitalized at Hazrat Rasool Akram Hospital in Tehran, Iran, including 401 healthy individuals and 262 stroke patients. A total of eight established ML (SVM, XGB, KNN, RF) and DL (DNN, FNN, LSTM, CNN) models were utilized to predict stroke. Techniques such as 10-fold cross-validation and hyperparameter tuning were implemented to prevent overfitting. The study also focused on interpretability through Shapley Additive Explanations (SHAP). The evaluation of model's performance was based on accuracy, specificity, sensitivity, F1-score, and ROC curve metrics. Among DL models, LSTM showed superior sensitivity at 96.15%, while FNN exhibited better specificity (96.0%), accuracy (96.0%), F1-score (95.0%), and ROC (98.0%) among DL models. For ML models, RF displayed higher sensitivity (99.9%), accuracy (99.0%), specificity (100%), F1-score (99.0%), and ROC (99.9%). Overall, RF outperformed all models, while DL models surpassed ML models in most metrics except for RF. DL models (CNN, LSTM, DNN, FNN) achieved sensitivities from 93.0 to 96.15%, specificities from 80.0 to 96.0%, accuracies from 92.0 to 96.0%, F1-scores from 87.34 to 95.0%, and ROC scores from 95.0 to 98.0%. In contrast, ML models (KNN, XGB, SVM) showed sensitivities between 29.0% and 94.0%, specificities between 89.47% and 96.0%, accuracies between 71.0% and 95.0%, F1-scores between 44.0% and 95.0%, and ROC scores between 64.0% and 95.0%. This study demonstrates the efficacy of DL and ML models in predicting stroke, with the RF models outperforming all others in key metrics. While DL models generally surpassed ML models, RF's exceptional performance highlights the potential of combining these technologies for early stroke detection, significantly improving patient outcomes by preventing severe consequences like permanent neurological damage or death.

Dataset Information

An interpretable machine learning model of cross-sectional U.S. county-level obesity prevalence using explainable artificial intelligence.

Background

Objective

Methods

Results

Conclusions

Publications

An interpretable machine learning model of cross-sectional U.S. county-level obesity prevalence using explainable artificial intelligence.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets