Dataset Information

Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality.

ABSTRACT: (1) Medical research has shown an increasing interest in machine learning, permitting massive multivariate data analysis. Thus, we developed drug intoxication mortality prediction models, and compared machine learning models and traditional logistic regression. (2) Categorized as drug intoxication, 8,937 samples were extracted from the Korea Centers for Disease Control and Prevention (2008-2017). We trained, validated, and tested each model through data and compared their performance using three measures: Brier score, calibration slope, and calibration-in-the-large. (3) A chi-square test demonstrated that mortality risk statistically significantly differed according to severity, intent, toxic substance, age, and sex. The multilayer perceptron model (MLP) had the highest area under the curve (AUC), and lowest Brier score in training and validation phases, while the logistic regression model (LR) showed the highest AUC (0.827) and lowest Brier score (0.0307) in the testing phase. MLP also had the second-highest AUC (0.816) and second-lowest Brier score (0.003258) in the testing phase, demonstrating better performance than the decision-making tree model. (4) Given the complexity of choosing tuning parameters, LR proved competitive when using medical datasets, which require strict accuracy.

SUBMITTER: Choi Y

PROVIDER: S-EPMC7037603 | biostudies-literature | 2020 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality.

Choi YoungJin Y Boo YooKyung Y

International journal of environmental research and public health 20200131 3

(1) Medical research has shown an increasing interest in machine learning, permitting massive multivariate data analysis. Thus, we developed drug intoxication mortality prediction models, and compared machine learning models and traditional logistic regression. (2) Categorized as drug intoxication, 8,937 samples were extracted from the Korea Centers for Disease Control and Prevention (2008-2017). We trained, validated, and tested each model through data and compared their performance using three ...[more]

PMID: 32023993

Similar Datasets

Project description:BackgroundIt is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods.ObjectiveThe objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data.DesignThis was a cohort study.SettingVeterans Affairs (VA) EHR data.ParticipantsVeterans age above 50 with a primary care visit in 2005, divided into separate training and testing cohorts (n= 124,360 each).Measurements and analytic methodsThe primary outcome was 10-year all-cause mortality. We considered 924 potential predictors across a wide range of EHR data elements including demographics (3), vital signs (9), medication classes (399), disease diagnoses (293), laboratory results (71), and health care utilization (149). We compared discrimination (c-statistics), calibration metrics, and diagnostic test characteristics (sensitivity, specificity, and positive and negative predictive values) of machine learning and regression models.ResultsOur cohort mean age (SD) was 68.2 (10.5), 93.9% were male; 39.4% died within 10 years. Models yielded testing cohort c-statistics between 0.827 and 0.837. Utilizing all 924 predictors, the Gradient Boosting model yielded the highest c-statistic [0.837, 95% confidence interval (CI): 0.835-0.839]. The full (unselected) logistic regression model had the highest c-statistic of regression models (0.833, 95% CI: 0.830-0.835) but showed evidence of overfitting. The discrimination of the stepwise selection logistic model (101 predictors) was similar (0.832, 95% CI: 0.830-0.834) with minimal overfitting. All models were well-calibrated and had similar diagnostic test characteristics.LimitationOur results should be confirmed in non-VA EHRs.ConclusionThe differences in c-statistic between the best machine learning model (924-predictor Gradient Boosting) and 101-predictor stepwise logistic models for 10-year mortality prediction were modest, suggesting stepwise regression methods continue to be a reasonable method for VA EHR mortality prediction model development.

Dataset Information

Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality.

Publications

Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets