Unknown

Dataset Information

0

Bias or biology? Importance of model interpretation in machine learning studies from electronic health records.


ABSTRACT:

Objective

The rate of diabetic complication progression varies across individuals and understanding factors that alter the rate of complication progression may uncover new clinical interventions for personalized diabetes management.

Materials and methods

We explore how various machine learning (ML) models and types of electronic health records (EHRs) can predict fast versus slow onset of neuropathy, nephropathy, ocular disease, or cardiovascular disease using only patient data collected prior to diabetes diagnosis.

Results

We find that optimized random forest models performed best to accurately predict the diagnosis of a diabetic complication, with the most effective model distinguishing between fast versus slow nephropathy (AUROC = 0.75). Using all data sets combined allowed for the highest model predictive performance, and social history or laboratory alone were most predictive. SHapley Additive exPlanations (SHAP) model interpretation allowed for exploration of predictors of fast and slow complication diagnosis, including underlying biases present in the EHR. Patients in the fast group had more medical visits, incurring a potential informed decision bias.

Discussion

Our study is unique in the realm of ML studies as it leverages SHAP as a starting point to explore patient markers not routinely used in diabetes monitoring. A mix of both bias and biological processes is likely present in influencing a model's ability to distinguish between groups.

Conclusion

Overall, model interpretation is a critical step in evaluating validity of a user-intended endpoint for a model when using EHR data, and predictors affected by bias and those driven by biologic processes should be equally recognized.

SUBMITTER: Momenzadeh A 

PROVIDER: S-EPMC9360778 | biostudies-literature | 2022 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bias or biology? Importance of model interpretation in machine learning studies from electronic health records.

Momenzadeh Amanda A   Shamsa Ali A   Meyer Jesse G JG  

JAMIA open 20220808 3


<h4>Objective</h4>The rate of diabetic complication progression varies across individuals and understanding factors that alter the rate of complication progression may uncover new clinical interventions for personalized diabetes management.<h4>Materials and methods</h4>We explore how various machine learning (ML) models and types of electronic health records (EHRs) can predict fast versus slow onset of neuropathy, nephropathy, ocular disease, or cardiovascular disease using only patient data col  ...[more]

Similar Datasets

| S-EPMC9205775 | biostudies-literature
| S-EPMC9860485 | biostudies-literature
| S-EPMC7556423 | biostudies-literature
| S-EPMC9235137 | biostudies-literature
| S-EPMC6352440 | biostudies-literature
| S-EPMC10467215 | biostudies-literature
| S-EPMC10130143 | biostudies-literature
| S-EPMC7966799 | biostudies-literature
| S-EPMC9152701 | biostudies-literature
| S-EPMC9846699 | biostudies-literature