Dataset Information

The prediction of hospital length of stay using unstructured data.

ABSTRACT:

Objective

This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis.

Methods

This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patient was admitted through the Emergency Department (ED) and stayed for more than two days in the subsequent service. LOS was predicted using two random forest models. The first included unstructured text extracted from electronic health records (EHRs). A word-embedding algorithm based on UMLS terminology with exact matching restricted to patient-centric affirmation sentences was used to assess the EHR data. The second model was primarily based on structured data in the form of diagnoses coded from the International Classification of Disease 10th Edition (ICD-10) and triage codes (CCMU/GEMSA classifications). Variables common to both models were: age, gender, zip/postal code, LOS in the ED, recent visit flag, assigned patient ward after the ED stay and short-term ED activity. Models were trained on 80% of data and performance was evaluated by accuracy on the remaining 20% test data.

Results

The model using unstructured data had a 75.0% accuracy compared to 74.1% for the model containing structured data. The two models produced a similar prediction in 86.6% of cases. In a secondary analysis restricted to intensive care patients, the accuracy of both models was also similar (76.3% vs 75.0%).

Conclusions

LOS prediction using unstructured data had similar accuracy to using structured data and can be considered of use to accurately model LOS.

SUBMITTER: Chrusciel J

PROVIDER: S-EPMC8684269 | biostudies-literature | 2021 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The prediction of hospital length of stay using unstructured data.

Chrusciel Jan J Girardon François F Roquette Lucien L Laplanche David D Duclos Antoine A Sanchez Stéphane S

BMC medical informatics and decision making 20211218 1

<h4>Objective</h4>This study aimed to assess the performance improvement for machine learning-based hospital length of stay (LOS) predictions when clinical signs written in text are accounted for and compared to the traditional approach of solely considering structured information such as age, gender and major ICD diagnosis.<h4>Methods</h4>This study was an observational retrospective cohort study and analyzed patient stays admitted between 1 January to 24 September 2019. For each stay, a patien ...[more]

PMID: 34922532

Similar Datasets

Project description:BackgroundThe COVID-19 pandemic has placed an unprecedented strain on health systems, with rapidly increasing demand for healthcare in hospitals and intensive care units (ICUs) worldwide. As the pandemic escalates, determining the resulting needs for healthcare resources (beds, staff, equipment) has become a key priority for many countries. Projecting future demand requires estimates of how long patients with COVID-19 need different levels of hospital care.MethodsWe performed a systematic review of early evidence on length of stay (LoS) of patients with COVID-19 in hospital and in ICU. We subsequently developed a method to generate LoS distributions which combines summary statistics reported in multiple studies, accounting for differences in sample sizes. Applying this approach, we provide distributions for total hospital and ICU LoS from studies in China and elsewhere, for use by the community.ResultsWe identified 52 studies, the majority from China (46/52). Median hospital LoS ranged from 4 to 53 days within China, and 4 to 21 days outside of China, across 45 studies. ICU LoS was reported by eight studies-four each within and outside China-with median values ranging from 6 to 12 and 4 to 19 days, respectively. Our summary distributions have a median hospital LoS of 14 (IQR 10-19) days for China, compared with 5 (IQR 3-9) days outside of China. For ICU, the summary distributions are more similar (median (IQR) of 8 (5-13) days for China and 7 (4-11) days outside of China). There was a visible difference by discharge status, with patients who were discharged alive having longer LoS than those who died during their admission, but no trend associated with study date.ConclusionPatients with COVID-19 in China appeared to remain in hospital for longer than elsewhere. This may be explained by differences in criteria for admission and discharge between countries, and different timing within the pandemic. In the absence of local data, the combined summary LoS distributions provided here can be used to model bed demands for contingency planning and then updated, with the novel method presented here, as more studies with aggregated statistics emerge outside China.

Project description:IntroductionEffective planning of elective surgical procedures requiring postoperative intensive care is important in preventing cancellations and empty intensive care unit (ICU) beds. To improve planning, we constructed, validated and tested three models designed to predict length of stay (LOS) in the ICU in individual patients.MethodsRetrospective data were collected from 518 consecutive patients who underwent oesophagectomy with reconstruction for carcinoma between January 1997 and April 2005. Three multivariable linear regression models for LOS, namely preoperative, postoperative and intra-ICU, were constructed using these data. Internal validation was assessed using bootstrap sampling in order to obtain validated estimates of the explained variance (r2). To determine the potential gain of the best performing model in day-to-day clinical practice, prospective data from a second cohort of 65 consecutive patients undergoing oesophagectomy between May 2005 and April 2006 were used in the model, and the predictive performance of the model was compared with prediction based on mean LOS.ResultsThe intra-ICU model had an r2 of 45% after internal validation. Important prognostic variables for LOS included greater patient age, comorbidity, type of surgical approach, intraoperative respiratory minute volume and complications occurring within 72 hours in the ICU. The potential gain of the best model in day-to-day clinical practice was determined relative to mean LOS. Use of the model reduced the deficit number (underestimation) of ICU days by 65 and increased the excess number (overestimation) of ICU days by 23 for the cohort of 65 patients. A conservative analysis conducted in the second, prospective cohort of patients revealed that 7% more oesophagectomies could have been accommodated, and 15% of cancelled procedures could have been prevented.ConclusionPatient characteristics can be used to create models that will help in predicting LOS in the ICU. This will result in more efficient use of ICU beds and fewer cancellations.

Project description:Machine learning can predict outcomes and determine variables contributing to precise prediction, and can thus classify patients with different risk factors of outcomes. This study aimed to investigate the predictive accuracy for mortality and length of stay in intensive care unit (ICU) patients using machine learning, and to identify the variables contributing to the precise prediction or classification of patients. Patients (n = 12,747) admitted to the ICU at Chiba University Hospital were randomly assigned to the training and test cohorts. After learning using the variables on admission in the training cohort, the area under the curve (AUC) was analyzed in the test cohort to evaluate the predictive accuracy of the supervised machine learning classifiers, including random forest (RF) for outcomes (primary outcome, mortality; secondary outcome, length of ICU stay). The rank of the variables that contributed to the machine learning prediction was confirmed, and cluster analysis of the patients with risk factors of mortality was performed to identify the important variables associated with patient outcomes. Machine learning using RF revealed a high predictive value for mortality, with an AUC of 0.945 (95% confidence interval [CI] 0.922-0.977). In addition, RF showed high predictive value for short and long ICU stays, with AUCs of 0.881 (95% CI 0.876-0.908) and 0.889 (95% CI 0.849-0.936), respectively. Lactate dehydrogenase (LDH) was identified as a variable contributing to the precise prediction in machine learning for both mortality and length of ICU stay. LDH was also identified as a contributing variable to classify patients into sub-populations based on different risk factors of mortality. The machine learning algorithm could predict mortality and length of stay in ICU patients with high accuracy. LDH was identified as a contributing variable in mortality and length of ICU stay prediction and could be used to classify patients based on mortality risk.

Project description:BackgroundPredicting hospital length of stay (LoS) for patients with COVID-19 infection is essential to ensure that adequate bed capacity can be provided without unnecessarily restricting care for patients with other conditions. Here, we demonstrate the utility of three complementary methods for predicting LoS using UK national- and hospital-level data.MethodOn a national scale, relevant patients were identified from the COVID-19 Hospitalisation in England Surveillance System (CHESS) reports. An Accelerated Failure Time (AFT) survival model and a truncation corrected method (TC), both with underlying Weibull distributions, were fitted to the data to estimate LoS from hospital admission date to an outcome (death or discharge) and from hospital admission date to Intensive Care Unit (ICU) admission date. In a second approach we fit a multi-state (MS) survival model to data directly from the Manchester University NHS Foundation Trust (MFT). We develop a planning tool that uses LoS estimates from these models to predict bed occupancy.ResultsAll methods produced similar overall estimates of LoS for overall hospital stay, given a patient is not admitted to ICU (8.4, 9.1 and 8.0 days for AFT, TC and MS, respectively). Estimates differ more significantly between the local and national level when considering ICU. National estimates for ICU LoS from AFT and TC were 12.4 and 13.4 days, whereas in local data the MS method produced estimates of 18.9 days.ConclusionsGiven the complexity and partiality of different data sources and the rapidly evolving nature of the COVID-19 pandemic, it is most appropriate to use multiple analysis methods on multiple datasets. The AFT method accounts for censored cases, but does not allow for simultaneous consideration of different outcomes. The TC method does not include censored cases, instead correcting for truncation in the data, but does consider these different outcomes. The MS method can model complex pathways to different outcomes whilst accounting for censoring, but cannot handle non-random case missingness. Overall, we conclude that data-driven modelling approaches of LoS using these methods is useful in epidemic planning and management, and should be considered for widespread adoption throughout healthcare systems internationally where similar data resources exist.

Project description:BackgroundPostoperative length of stay is a key indicator in the management of medical resources and an indirect predictor of the incidence of surgical complications and the degree of recovery of the patient after cancer surgery. Recently, machine learning has been used to predict complex medical outcomes, such as prolonged length of hospital stay, using extensive medical information.ObjectiveThe objective of this study was to develop a prediction model for prolonged length of stay after cancer surgery using a machine learning approach.MethodsIn our retrospective study, electronic health records (EHRs) from 42,751 patients who underwent primary surgery for 17 types of cancer between January 1, 2000, and December 31, 2017, were sourced from a single cancer center. The EHRs included numerous variables such as surgical factors, cancer factors, underlying diseases, functional laboratory assessments, general assessments, medications, and social factors. To predict prolonged length of stay after cancer surgery, we employed extreme gradient boosting classifier, multilayer perceptron, and logistic regression models. Prolonged postoperative length of stay for cancer was defined as bed-days of the group of patients who accounted for the top 50% of the distribution of bed-days by cancer type.ResultsIn the prediction of prolonged length of stay after cancer surgery, extreme gradient boosting classifier models demonstrated excellent performance for kidney and bladder cancer surgeries (area under the receiver operating characteristic curve [AUC] >0.85). A moderate performance (AUC 0.70-0.85) was observed for stomach, breast, colon, thyroid, prostate, cervix uteri, corpus uteri, and oral cancers. For stomach, breast, colon, thyroid, and lung cancers, with more than 4000 cases each, the extreme gradient boosting classifier model showed slightly better performance than the logistic regression model, although the logistic regression model also performed adequately. We identified risk variables for the prediction of prolonged postoperative length of stay for each type of cancer, and the importance of the variables differed depending on the cancer type. After we added operative time to the models trained on preoperative factors, the models generally outperformed the corresponding models using only preoperative variables.ConclusionsA machine learning approach using EHRs may improve the prediction of prolonged length of hospital stay after primary cancer surgery. This algorithm may help to provide a more effective allocation of medical resources in cancer surgery.

Dataset Information

The prediction of hospital length of stay using unstructured data.

Objective

Methods

Results

Conclusions

Publications

The prediction of hospital length of stay using unstructured data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets