Dataset Information

Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database.

ABSTRACT: Background: Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features available on the first day of admission. Methods: A retrospective administrative database study from the Medical Information Mart for Intensive Care (MIMIC-III) database was conducted. The exposures of interest consisted of the demographics, pre-ICU comorbidity, ICU diagnosis, disease severity scores, vital signs, and laboratory test results on the first day of ICU admission. Hospital mortality was used as the outcome. We used the machine learning methods of k-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, Extreme Gradient Boosting (XGBoost), and neural network for model establishment. A sample of 70% of the cohort was used for the training set; the remaining 30% was applied for testing. Areas under the receiver operating characteristic curves (AUCs) and calibration plots would be constructed for the evaluation and comparison of the models' performance. The significance of the risk factors was identified through models and the top factors were reported. Results: A total of 28,530 subjects were enrolled through the screening of the MIMIC-III database. After data preprocessing, 25,659 adult patients with 66 predictors were included in the model analyses. With the training set, the models of KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost were established and the testing set obtained AUCs of 0.806, 0.818, 0.743, 0.819, 0.780, 0.803, and 0.821, respectively. The calibration curves of all the models, except for the neural network, performed well. The XGBoost model performed best among the seven models. The top five predictors were age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate. Conclusion: The current study indicates that models with the risk of factors on the first day could be successfully established for predicting mortality in ventilated patients. The XGBoost model performs best among the seven machine learning models.

SUBMITTER: Zhu Y

PROVIDER: S-EPMC8280779 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database.

Zhu Yibing Y Zhang Jin J Wang Guowei G Yao Renqi R Ren Chao C Chen Ge G Jin Xin X Guo Junyang J Liu Shi S Zheng Hua H Chen Yan Y Guo Qianqian Q Li Lin L Du Bin B Xi Xiuming X Li Wei W Huang Huibin H Li Yang Y Yu Qian Q

Frontiers in medicine 20210701

<b>Background:</b> Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features av ...[more]

PMID: 34277655

Similar Datasets

Project description:Objective: The mortality rate of critically ill patients in ICUs is relatively high. In order to evaluate patients' mortality risk, different scoring systems are used to help clinicians assess prognosis in ICUs, such as the Acute Physiology and Chronic Health Evaluation III (APACHE III) and the Logistic Organ Dysfunction Score (LODS). In this research, we aimed to establish and compare multiple machine learning models with physiology subscores of APACHE III-namely, the Acute Physiology Score III (APS III)-and LODS scoring systems in order to obtain better performance for ICU mortality prediction. Methods: A total number of 67,748 patients from the Medical Information Database for Intensive Care (MIMIC-IV) were enrolled, including 7055 deceased patients, and the same number of surviving patients were selected by the random downsampling technique, for a total of 14,110 patients included in the study. The enrolled patients were randomly divided into a training dataset (n = 9877) and a validation dataset (n = 4233). Fivefold cross-validation and grid search procedures were used to find and evaluate the best hyperparameters in different machine learning models. Taking the subscores of LODS and the physiology subscores that are part of the APACHE III scoring systems as input variables, four machine learning methods of XGBoost, logistic regression, support vector machine, and decision tree were used to establish ICU mortality prediction models, with AUCs as metrics. AUCs, specificity, sensitivity, positive predictive value, negative predictive value, and calibration curves were used to find the best model. Results: For the prediction of mortality risk in ICU patients, the AUC of the XGBoost model was 0.918 (95%CI, 0.915-0.922), and the AUCs of logistic regression, SVM, and decision tree were 0.872 (95%CI, 0.867-0.877), 0.872 (95%CI, 0.867-0.877), and 0.852 (95%CI, 0.847-0.857), respectively. The calibration curves of logistic regression and support vector machine performed better than the other two models in the ranges 0-40% and 70%-100%, respectively, while XGBoost performed better in the range of 40-70%. Conclusions: The mortality risk of ICU patients can be better predicted by the characteristics of the Acute Physiology Score III and the Logistic Organ Dysfunction Score with XGBoost in terms of ROC curve, sensitivity, and specificity. The XGBoost model could assist clinicians in judging in-hospital outcome of critically ill patients, especially in patients with a more uncertain survival outcome.

Project description:BackgroundMechanical ventilation (MV) is vital for critically ill ICU patients but carries significant mortality risks. This study aims to develop a predictive model to estimate hospital mortality among MV patients, utilizing comprehensive health data to assist ICU physicians with early-stage alerts.MethodsWe developed a Machine Learning (ML) framework to predict hospital mortality in ICU patients receiving MV. Using the MIMIC-III database, we identified 25,202 eligible patients through ICD-9 codes. We employed backward elimination and the Lasso method, selecting 32 features based on clinical insights and literature. Data preprocessing included eliminating columns with over 90% missing data and using mean imputation for the remaining missing values. To address class imbalance, we used the Synthetic Minority Over-sampling Technique (SMOTE). We evaluated several ML models, including CatBoost, XGBoost, Decision Tree, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Logistic Regression, using a 70/30 train-test split. The CatBoost model was chosen for its superior performance in terms of accuracy, precision, recall, F1-score, AUROC metrics, and calibration plots.ResultsThe study involved a cohort of 25,202 patients on MV. The CatBoost model attained an AUROC of 0.862, an increase from an initial AUROC of 0.821, which was the best reported in the literature. It also demonstrated an accuracy of 0.789, an F1-score of 0.747, and better calibration, outperforming other models. These improvements are due to systematic feature selection and the robust gradient boosting architecture of CatBoost.ConclusionThe preprocessing methodology significantly reduced the number of relevant features, simplifying computational processes, and identified critical features previously overlooked. Integrating these features and tuning the parameters, our model demonstrated strong generalization to unseen data. This highlights the potential of ML as a crucial tool in ICUs, enhancing resource allocation and providing more personalized interventions for MV patients.

Project description:ObjectiveThe predictors of in-hospital mortality for intensive care units (ICUs)-admitted heart failure (HF) patients remain poorly characterised. We aimed to develop and validate a prediction model for all-cause in-hospital mortality among ICU-admitted HF patients.DesignA retrospective cohort study.Setting and participantsData were extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. Data on 1177 heart failure patients were analysed.MethodsPatients meeting the inclusion criteria were identified from the MIMIC-III database and randomly divided into derivation (n=825, 70%) and a validation (n=352, 30%) group. Independent risk factors for in-hospital mortality were screened using the extreme gradient boosting (XGBoost) and the least absolute shrinkage and selection operator (LASSO) regression models in the derivation sample. Multivariate logistic regression analysis was used to build prediction models in derivation group, and then validated in validation cohort. Discrimination, calibration and clinical usefulness of the predicting model were assessed using the C-index, calibration plot and decision curve analysis. After pairwise comparison, the best performing model was chosen to build a nomogram according to the regression coefficients.ResultsAmong the 1177 admissions, in-hospital mortality was 13.52%. In both groups, the XGBoost, LASSO regression and Get With the Guidelines-Heart Failure (GWTG-HF) risk score models showed acceptable discrimination. The XGBoost and LASSO regression models also showed good calibration. In pairwise comparison, the prediction effectiveness was higher with the XGBoost and LASSO regression models than with the GWTG-HF risk score model (p<0.05). The XGBoost model was chosen as our final model for its more concise and wider net beneﬁt threshold probability range and was presented as the nomogram.ConclusionsOur nomogram enabled good prediction of in-hospital mortality in ICU-admitted HF patients, which may help clinical decision-making for such patients.

Project description:BackgroundFor mechanically ventilated critically ill COVID-19 patients, prone positioning has quickly become an important treatment strategy, however, prone positioning is labor intensive and comes with potential adverse effects. Therefore, identifying which critically ill intubated COVID-19 patients will benefit may help allocate labor resources.MethodsFrom the multi-center Dutch Data Warehouse of COVID-19 ICU patients from 25 hospitals, we selected all 3619 episodes of prone positioning in 1142 invasively mechanically ventilated patients. We excluded episodes longer than 24 h. Berlin ARDS criteria were not formally documented. We used supervised machine learning algorithms Logistic Regression, Random Forest, Naive Bayes, K-Nearest Neighbors, Support Vector Machine and Extreme Gradient Boosting on readily available and clinically relevant features to predict success of prone positioning after 4 h (window of 1 to 7 h) based on various possible outcomes. These outcomes were defined as improvements of at least 10% in PaO2/FiO2 ratio, ventilatory ratio, respiratory system compliance, or mechanical power. Separate models were created for each of these outcomes. Re-supination within 4 h after pronation was labeled as failure. We also developed models using a 20 mmHg improvement cut-off for PaO2/FiO2 ratio and using a combined outcome parameter. For all models, we evaluated feature importance expressed as contribution to predictive performance based on their relative ranking.ResultsThe median duration of prone episodes was 17 h (11-20, median and IQR, N = 2632). Despite extensive modeling using a plethora of machine learning techniques and a large number of potentially clinically relevant features, discrimination between responders and non-responders remained poor with an area under the receiver operator characteristic curve of 0.62 for PaO2/FiO2 ratio using Logistic Regression, Random Forest and XGBoost. Feature importance was inconsistent between models for different outcomes. Notably, not even being a previous responder to prone positioning, or PEEP-levels before prone positioning, provided any meaningful contribution to predicting a successful next proning episode.ConclusionsIn mechanically ventilated COVID-19 patients, predicting the success of prone positioning using clinically relevant and readily available parameters from electronic health records is currently not feasible. Given the current evidence base, a liberal approach to proning in all patients with severe COVID-19 ARDS is therefore justified and in particular regardless of previous results of proning.

Project description:Nitrogen is the most limiting nutrient for turfgrass growth. Instead of pursuing the maximum yield, most turfgrass managers use nitrogen (N) to maintain a sub-maximal growth rate. Few tools or soil tests exist to help managers guide N fertilizer decisions. Turf growth prediction models have the potential to be useful, but the currently existing turf growth prediction model only takes temperature into account, limiting its accuracy. This study developed machine-learning-based turf growth models using the random forest (RF) algorithm to estimate short-term turfgrass clipping yield. To build the RF model, a large set of variables were extracted as predictors including the 7-day weather, traffic intensity, soil moisture content, N fertilization rate, and the normalized difference red edge (NDRE) vegetation index. In this study, the data were collected from two putting greens where the turfgrass received 0 to 1,800 round/week traffic rates, various irrigation rates to maintain the soil moisture content between 9 and 29%, and N fertilization rates of 0 to 17.5 kg ha-1 applied biweekly. The RF model agreed with the actual clipping yield collected from the experimental results. The temperature and relative humidity were the most important weather factors. Including NDRE improved the prediction accuracy of the model. The highest coefficient of determination (R2) of the RF model was 0.64 for the training dataset and was 0.47 for the testing data set upon the evaluation of the model. This represented a large improvement over the existing growth prediction model (R 2 = 0.01). However, the machine-learning models created were not able to accurately predict the clipping production at other locations. Individual golf courses can create customized growth prediction models using clipping volume to eliminate the deviation caused by temporal and spatial variability. Overall, this study demonstrated the feasibility of creating machine-learning-based yield prediction models that may be able to guide N fertilization decisions on golf course putting greens and presumably other turfgrass areas.

Project description:BackgroundAccurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of these models varies. Hepatitis E, as an acute liver disease, has been a major public health problem. Which model is more appropriate for predicting the incidence of hepatitis E? In this paper, three different methods are used and the performance of the three methods is compared.MethodsAutoregressive integrated moving average(ARIMA), support vector machine(SVM) and long short-term memory(LSTM) recurrent neural network were adopted and compared. ARIMA was implemented by python with the help of statsmodels. SVM was accomplished by matlab with libSVM library. LSTM was designed by ourselves with Keras, a deep learning library. To tackle the problem of overfitting caused by limited training samples, we adopted dropout and regularization strategies in our LSTM model. Experimental data were obtained from the monthly incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE).ResultsBy analyzing data, we took ARIMA(1, 1, 1), ARIMA(3, 1, 2) as monthly incidence prediction model and cases number prediction model, respectively. Cross-validation and grid search were used to optimize parameters of SVM. Penalty coefficient C and kernel function parameter g were set 8, 0.125 for incidence prediction, and 22, 0.01 for cases number prediction. LSTM has 4 nodes. Dropout and L2 regularization parameters were set 0.15, 0.001, respectively. By the metrics of RMSE, we obtained 0.022, 0.0204, 0.01 for incidence prediction, using ARIMA, SVM and LSTM. And we obtained 22.25, 20.0368, 11.75 for cases number prediction, using three models. For MAPE metrics, the results were 23.5%, 21.7%, 15.08%, and 23.6%, 21.44%, 13.6%, for incidence prediction and cases number prediction, respectively. For MAE metrics, the results were 0.018, 0.0167, 0.011 and 18.003, 16.5815, 9.984, for incidence prediction and cases number prediction, respectively.ConclusionsComparing ARIMA, SVM and LSTM, we found that nonlinear models(SVM, LSTM) outperform linear models(ARIMA). LSTM obtained the best performance in all three metrics of RSME, MAPE, MAE. Hence, LSTM is the most suitable for predicting hepatitis E monthly incidence and cases number.

Project description:ObjectivesTo evaluate the contribution of a preextubation chest X-ray (CXR) to identify the risk of extubation failure in mechanically ventilated patients.DesignRetrospective cohort study.SettingsICUs in a tertiary center (the Medical Information Mart for Intensive Care IV database).PatientsPatients greater than or equal to 18 years old who were mechanically ventilated and extubated after a spontaneous breathing trial.InterventionsNone.Measurements and main resultsAmong 1,066 mechanically ventilated patients, 132 patients (12%) experienced extubation failure, defined as reintubation or death within 48 hours of extubation. To predict extubation failure, we developed the following models based on deep learning (EfficientNet) and machine learning (LightGBM) with the training data: 1) model using only the rapid-shallow breathing index (RSBI), 2) model using RSBI and CXR, 3) model using all candidate clinical predictors (i.e., patient demographics, vital signs, laboratory values, and ventilator settings) other than CXR, and 4) model using all candidate clinical predictors with CXR. We compared the predictive abilities between models with the test data to investigate the predictive contribution of CXR. The predictive ability of the model using CXR as well as RSBI was not significantly higher than that of the model using only RSBI (c-statistics, 0.56 vs 0.56; p = 0.95). The predictive ability of the model using clinical predictors with CXR was not significantly higher than that of the model using all clinical predictors other than CXR (c-statistics, 0.71 vs 0.70; p = 0.12). Based on SHapley Additive exPlanations values to interpret the model using all clinical predictors with CXR, CXR was less likely to contribute to the predictive ability than other predictors (e.g., duration of mechanical ventilation, inability to follow commands, and heart rate).ConclusionsAdding CXR to a set of other clinical predictors in our prediction model did not significantly improve the predictive ability of extubation failure in mechanically ventilated patients.

Dataset Information

Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database.

Publications

Machine Learning Prediction Models for Mechanically Ventilated Patients: Analyses of the MIMIC-III Database.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets