Dataset Information

Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study.

ABSTRACT: Background:Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression. Objective:The purpose of this study was to use machine learning methods to predict GDM and compare their performance with that of logistic regressions. Methods:We performed a retrospective, observational study including women who attended their routine first hospital visits during early pregnancy and had Down's syndrome screening at 16-20 gestational weeks in a tertiary maternity hospital in China from 2013.1.1 to 2017.12.31. A total of 22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM. Candidate predictors included maternal demographic characteristics and medical history (maternal factors) and laboratory values at early pregnancy. The models were derived from the first 70% of the data and then validated with the next 30%. Variables were trained in different machine learning models and traditional logistic regression models. Eight common machine learning methods (GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest) and two common regressions (stepwise logistic regression and logistic regression with RCS) were implemented to predict the occurrence of GDM. Models were compared on discrimination and calibration metrics. Results:In the validation dataset, the machine learning and logistic regression models performed moderately (AUC 0.59-0.74). Overall, the GBDT model performed best (AUC 0.74, 95% CI 0.71-0.76) among the machine learning methods, with negligible differences between them. Fasting blood glucose, HbA1c, triglycerides, and BMI strongly contributed to GDM. A cutoff point for the predictive value at 0.3 in the GBDT model had a negative predictive value of 74.1% (95% CI 69.5%-78.2%) and a sensitivity of 90% (95% CI 88.0%-91.7%), and the cutoff point at 0.7 had a positive predictive value of 93.2% (95% CI 88.2%-96.1%) and a specificity of 99% (95% CI 98.2%-99.4%). Conclusion:In this study, we found that several machine learning methods did not outperform logistic regression in predicting GDM. We developed a model with cutoff points for risk stratification of GDM.

SUBMITTER: Ye Y

PROVIDER: S-EPMC7306091 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundTimely and accurate prediction of delayed cerebral ischemia is critical for improving the prognosis of patients with aneurysmal subarachnoid hemorrhage. Machine learning (ML) algorithms are increasingly regarded as having a higher prediction power than conventional logistic regression (LR). This study aims to construct LR and ML models and compare their prediction power on delayed cerebral ischemia (DCI) after aneurysmal subarachnoid hemorrhage (aSAH).MethodsThis was a multicenter, retrospective, observational cohort study that enrolled patients with aneurysmal subarachnoid hemorrhage from five hospitals in China. A total of 404 aSAH patients were prospectively enrolled. We randomly divided the patients into training (N = 303) and validation cohorts (N = 101) according to a ratio of 75-25%. One LR and six popular ML algorithms were used to construct models. The area under the receiver operating characteristic curve (AUC), accuracy, balanced accuracy, confusion matrix, sensitivity, specificity, calibration curve, and Hosmer-Lemeshow test were used to assess and compare the model performance. Finally, we calculated each feature of importance.ResultsA total of 112 (27.7%) patients developed DCI. Our results showed that conventional LR with an AUC value of 0.824 (95%CI: 0.73-0.91) in the validation cohort outperformed k-nearest neighbor, decision tree, support vector machine, and extreme gradient boosting model with the AUCs of 0.792 (95%CI: 0.68-0.9, P = 0.46), 0.675 (95%CI: 0.56-0.79, P < 0.01), 0.677 (95%CI: 0.57-0.77, P < 0.01), and 0.78 (95%CI: 0.68-0.87, P = 0.50). However, random forest (RF) and artificial neural network model with the same AUC (0.858, 95%CI: 0.78-0.93, P = 0.26) were better than the LR. The accuracy and the balanced accuracy of the RF were 20.8% and 11% higher than the latter, and the RF also showed good calibration in the validation cohort (Hosmer-Lemeshow: P = 0.203). We found that the CT value of subarachnoid hemorrhage, WBC count, neutrophil count, CT value of cerebral edema, and monocyte count were the five most important features for DCI prediction in the RF model. We then developed an online prediction tool (https://dynamic-nomogram.shinyapps.io/DynNomapp-DCI/) based on important features to calculate DCI risk precisely.ConclusionsIn this multicenter study, we found that several ML methods, particularly RF, outperformed conventional LR. Furthermore, an online prediction tool based on the RF model was developed to identify patients at high risk for DCI after SAH and facilitate timely interventions.Clinical trial registrationhttp://www.chictr.org.cn, Unique identifier: ChiCTR2100044448.

Project description:BackgroundGeneralized regression neural network (GRNN) and logistic regression (LR) are extensively used in the medical field; however, the better model for predicting stroke outcome has not been established. The primary goal of this study was to compare the accuracies of GRNN and LR models to identify the most optimal model for the prediction of acute stroke outcome, as well as explore useful biomarkers for predicting the prognosis of acute stroke patients.MethodIn a single-center study, 216 (80% for the training set and 20% for the test set) acute stroke patients admitted to the Shenzhen Second People's Hospital between December 2019 to June 2021 were retrospectively recruited. The functional outcomes of the patients were measured using Barthel Index (BI) on discharge. A training set was used to optimize the GRNN and LR models. The test set was utilized to validate and compare the performances of GRNN and LR in predicting acute stroke outcome based on the area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, and the Kappa value.ResultThe LR analysis showed that age, the National Institute Health Stroke Scale score, BI index, hemoglobin, and albumin were independently associated with stroke outcome. After validating in test set using these variables, we found that the GRNN model showed a better performance based on AUROC (0.931 vs 0.702), sensitivity (0.933 vs 0.700), specificity (0.889 vs 0.722), accuracy (0.896 vs 0.729), and the Kappa value (0.775 vs 0.416) than the LR model.ConclusionOverall, the GRNN model demonstrated superior performance to the LR model in predicting the prognosis of acute stroke patients. In addition to its advantage in not affected by implicit interactions and complex relationship in the data. Thus, we suggested that GRNN could be served as the optimal statistical model for acute stroke outcome prediction. Simultaneously, prospective validation based on more variables of the GRNN model for the prediction is required in future studies.

Dataset Information

Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets