Dataset Information

An explainable machine learning framework for lung cancer hospital length of stay prediction.

ABSTRACT: This work introduces a predictive Length of Stay (LOS) framework for lung cancer patients using machine learning (ML) models. The framework proposed to deal with imbalanced datasets for classification-based approaches using electronic healthcare records (EHR). We have utilized supervised ML methods to predict lung cancer inpatients LOS during ICU hospitalization using the MIMIC-III dataset. Random Forest (RF) Model outperformed other models and achieved predicted results during the three framework phases. With clinical significance features selection, over-sampling methods (SMOTE and ADASYN) achieved the highest AUC results (98% with CI 95%: 95.3-100%, and 100% respectively). The combination of Over-sampling and under-sampling achieved the second-highest AUC results (98%, with CI 95%: 95.3-100%, and 97%, CI 95%: 93.7-100% SMOTE-Tomek, and SMOTE-ENN respectively). Under-sampling methods reported the least important AUC results (50%, with CI 95%: 40.2-59.8%) for both (ENN and Tomek- Links). Using ML explainable technique called SHAP, we explained the outcome of the predictive model (RF) with SMOTE class balancing technique to understand the most significant clinical features that contributed to predicting lung cancer LOS with the RF model. Our promising framework allows us to employ ML techniques in-hospital clinical information systems to predict lung cancer admissions into ICU.

SUBMITTER: Alsinglawi B

PROVIDER: S-EPMC8755804 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An explainable machine learning framework for lung cancer hospital length of stay prediction.

Alsinglawi Belal B Alshari Osama O Alorjani Mohammed M Mubin Omar O Alnajjar Fady F Novoa Mauricio M Darwish Omar O

Scientific reports 20220112 1

This work introduces a predictive Length of Stay (LOS) framework for lung cancer patients using machine learning (ML) models. The framework proposed to deal with imbalanced datasets for classification-based approaches using electronic healthcare records (EHR). We have utilized supervised ML methods to predict lung cancer inpatients LOS during ICU hospitalization using the MIMIC-III dataset. Random Forest (RF) Model outperformed other models and achieved predicted results during the three framewo ...[more]

PMID: 35022512

Similar Datasets

Project description:BackgroundPostoperative length of stay is a key indicator in the management of medical resources and an indirect predictor of the incidence of surgical complications and the degree of recovery of the patient after cancer surgery. Recently, machine learning has been used to predict complex medical outcomes, such as prolonged length of hospital stay, using extensive medical information.ObjectiveThe objective of this study was to develop a prediction model for prolonged length of stay after cancer surgery using a machine learning approach.MethodsIn our retrospective study, electronic health records (EHRs) from 42,751 patients who underwent primary surgery for 17 types of cancer between January 1, 2000, and December 31, 2017, were sourced from a single cancer center. The EHRs included numerous variables such as surgical factors, cancer factors, underlying diseases, functional laboratory assessments, general assessments, medications, and social factors. To predict prolonged length of stay after cancer surgery, we employed extreme gradient boosting classifier, multilayer perceptron, and logistic regression models. Prolonged postoperative length of stay for cancer was defined as bed-days of the group of patients who accounted for the top 50% of the distribution of bed-days by cancer type.ResultsIn the prediction of prolonged length of stay after cancer surgery, extreme gradient boosting classifier models demonstrated excellent performance for kidney and bladder cancer surgeries (area under the receiver operating characteristic curve [AUC] >0.85). A moderate performance (AUC 0.70-0.85) was observed for stomach, breast, colon, thyroid, prostate, cervix uteri, corpus uteri, and oral cancers. For stomach, breast, colon, thyroid, and lung cancers, with more than 4000 cases each, the extreme gradient boosting classifier model showed slightly better performance than the logistic regression model, although the logistic regression model also performed adequately. We identified risk variables for the prediction of prolonged postoperative length of stay for each type of cancer, and the importance of the variables differed depending on the cancer type. After we added operative time to the models trained on preoperative factors, the models generally outperformed the corresponding models using only preoperative variables.ConclusionsA machine learning approach using EHRs may improve the prediction of prolonged length of hospital stay after primary cancer surgery. This algorithm may help to provide a more effective allocation of medical resources in cancer surgery.

Project description:Machine learning can predict outcomes and determine variables contributing to precise prediction, and can thus classify patients with different risk factors of outcomes. This study aimed to investigate the predictive accuracy for mortality and length of stay in intensive care unit (ICU) patients using machine learning, and to identify the variables contributing to the precise prediction or classification of patients. Patients (n = 12,747) admitted to the ICU at Chiba University Hospital were randomly assigned to the training and test cohorts. After learning using the variables on admission in the training cohort, the area under the curve (AUC) was analyzed in the test cohort to evaluate the predictive accuracy of the supervised machine learning classifiers, including random forest (RF) for outcomes (primary outcome, mortality; secondary outcome, length of ICU stay). The rank of the variables that contributed to the machine learning prediction was confirmed, and cluster analysis of the patients with risk factors of mortality was performed to identify the important variables associated with patient outcomes. Machine learning using RF revealed a high predictive value for mortality, with an AUC of 0.945 (95% confidence interval [CI] 0.922-0.977). In addition, RF showed high predictive value for short and long ICU stays, with AUCs of 0.881 (95% CI 0.876-0.908) and 0.889 (95% CI 0.849-0.936), respectively. Lactate dehydrogenase (LDH) was identified as a variable contributing to the precise prediction in machine learning for both mortality and length of ICU stay. LDH was also identified as a contributing variable to classify patients into sub-populations based on different risk factors of mortality. The machine learning algorithm could predict mortality and length of stay in ICU patients with high accuracy. LDH was identified as a contributing variable in mortality and length of ICU stay prediction and could be used to classify patients based on mortality risk.

Dataset Information

An explainable machine learning framework for lung cancer hospital length of stay prediction.

Publications

An explainable machine learning framework for lung cancer hospital length of stay prediction.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets