Dataset Information

Personalized stratification of hospitalization risk amidst COVID-19: A machine learning approach.

ABSTRACT: Objective: In the wake of COVID-19, the United States (U.S.) developed a three stage plan to outline the parameters to determine when states may reopen businesses and ease travel restrictions. The guidelines also identify subpopulations of Americans deemed to be at high risk for severe disease should they contract COVID-19. These guidelines were based on population level demographics, rather than individual-level risk factors. As such, they may misidentify individuals at high risk for severe illness, and may therefore be of limited use in decisions surrounding resource allocation to vulnerable populations. The objective of this study was to evaluate a machine learning algorithm for prediction of serious illness due to COVID-19 using inpatient data collected from electronic health records. Methods: The algorithm was trained to identify patients for whom a diagnosis of COVID-19 was likely to result in hospitalization, and compared against four U.S. policy-based criteria: age over 65; having a serious underlying health condition; age over 65 or having a serious underlying health condition; and age over 65 and having a serious underlying health condition. Results: This algorithm identified 80% of patients at risk for hospitalization due to COVID-19, versus 62% identified by government guidelines. The algorithm also achieved a high specificity of 95%, outperforming government guidelines. Conclusions: This algorithm may identify individuals likely to require hospitalization should they contract COVID-19. This information may be useful to guide vaccine distribution, anticipate hospital resource needs, and assist health care policymakers to make care decisions in a more principled manner.

SUBMITTER: Lam C

PROVIDER: S-EPMC8333026 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Personalized stratification of hospitalization risk amidst COVID-19: A machine learning approach.

Lam Carson C Calvert Jacob J Siefkas Anna A Barnes Gina G Pellegrini Emily E Green-Saxena Abigail A Hoffman Jana J Mao Qingqing Q Das Ritankar R

Health policy and technology 20210804 3

<b>Objective:</b> In the wake of COVID-19, the United States (U.S.) developed a three stage plan to outline the parameters to determine when states may reopen businesses and ease travel restrictions. The guidelines also identify subpopulations of Americans deemed to be at high risk for severe disease should they contract COVID-19. These guidelines were based on population level demographics, rather than individual-level risk factors. As such, they may misidentify individuals at high risk for sev ...[more]

PMID: 34367900

Similar Datasets

Project description:Background and aimCOVID-19 can be presented with various gastrointestinal symptoms. Shortly after the pandemic outbreak, several machine learning algorithms were implemented to assess new diagnostic and therapeutic methods for this disease. The aim of this study is to assess gastrointestinal and liver-related predictive factors for SARS-CoV-2 associated risk of hospitalization.MethodsData collection was based on a questionnaire from the COVID-19 outpatient test center and from the emergency department at the University Hospital in combination with the data from internal hospital information system and from a mobile application used for telemedicine follow-up of patients. For statistical analysis SARS-CoV-2 negative patients were considered as controls in three different SARS-CoV-2 positive patient groups (divided based on severity of the disease). The data were visualized and analyzed in R version 4.0.5. The Chi-squared or Fisher test was applied to test the null hypothesis of independence between the factors followed, where appropriate, by the multiple comparisons with the Benjamini Hochberg adjustment. The null hypothesis of the equality of the population medians of a continuous variable was tested by the Kruskal Wallis test, followed by the Dunn multiple comparisons test. In order to assess predictive power of the gastrointestinal parameters and other measured variables for predicting an outcome of the patient group the Random Forest machine learning algorithm was trained on the data. The predictive ability was quantified by the ROC curve, constructed from the Out-of-Bag data. Matthews correlation coefficient was used as a one-number summary of the quality of binary classification. The importance of the predictors was measured using the Variable Importance. A 2D representation of the data was obtained by means of Principal Component Analysis for mixed type of data. Findings with the p-value below 0.05 were considered statistically significant.ResultsA total of 710 patients were enrolled in the study. The presence of diarrhea and nausea was significantly higher in the emergency department group than in the COVID-19 outpatient test center. Among liver enzymes only aspartate transaminase (AST) has been significantly elevated in the hospitalized group compared to patients discharged home. Based on the Random Forest algorithm, AST has been identified as the most important predictor followed by age or diabetes mellitus. Diarrhea and bloating have also predictive importance, although much lower than AST.ConclusionSARS-CoV-2 positivity is connected with isolated AST elevation and the level is linked with the severity of the disease. Furthermore, using the machine learning Random Forest algorithm, we have identified the elevated AST as the most important predictor for COVID-19 related hospitalizations.

Project description:ObjectiveClinical triage in coronavirus disease 2019 (COVID-19) places a heavy burden on senior clinicians during a pandemic situation. However, risk stratification based on serum biomarker bioprofiling could be implemented by a larger, nonspecialist workforce.MethodMeasures of Complement Activation and inflammation in patientS with CoronAvirus DisEase 2019 (CASCADE) patients (n = 72), (clinicaltrials.gov: NCT04453527), classified as mild, moderate, or severe (by support needed to maintain SpO2 > 93%), and healthy controls (HC, n = 20), were bioprofiled using 76 immunological biomarkers and compared using ANOVA. Spearman correlation analysis on biomarker pairs was visualised via heatmaps. Linear Discriminant Analysis (LDA) models were generated to identify patients likely to deteriorate. An X-Gradient-boost (XGB) model trained on CASCADE data to triage patients as mild, moderate, and severe was retrospectively employed to classify COROnavirus Nomacopan Emergency Treatment for covid 19 infected patients with early signs of respiratory distress (CORONET) patients (n = 7) treated with nomacopan.ResultsThe LDA models distinctly discriminated between deteriorators, nondeteriorators, and HC, with IL-27, IP-10, MDC, ferritin, C5, and sC5b-9 among the key predictor variables during deterioration. C3a and C5 were elevated in all severity classes vs. HC (p < 0.05). sC5b-9 was elevated in the "moderate" and "severe" categories vs. HC (p < 0.001). Heatmap analysis shows a pairwise increase of negatively correlated pairs with IL-27. The XGB model indicated sC5b-9, IL-8, MCP1, and prothrombin F1 and F2 were key discriminators in nomacopan-treated patients (CORONET study).ConclusionDistinct immunological fingerprints from serum biomarkers exist within different severity classes of COVID-19, and harnessing them using machine learning enabled the development of clinically useful triage and prognostic tools. Complement-mediated lung injury plays a key role in COVID-19 pneumonia, and preliminary results hint at the usefulness of a C5 inhibitor in COVID-19 recovery.

Project description:BackgroundThe COVID-19 pandemic has led to an increased demand for health care resources and, in some cases, shortage of medical equipment and staff. Our objective was to develop and validate a multivariable model to predict risk of hospitalization for patients infected with SARS-CoV-2.MethodsWe used routinely collected health records in a patient cohort to develop and validate our prediction model. This cohort included adult patients (age ≥ 18 yr) from Ontario, Canada, who tested positive for SARS-CoV-2 ribonucleic acid by polymerase chain reaction between Feb. 2 and Oct. 5, 2020, and were followed up through Nov. 5, 2020. Patients living in long-term care facilities were excluded, as they were all assumed to be at high risk of hospitalization for COVID-19. Risk of hospitalization within 30 days of diagnosis of SARS-CoV-2 infection was estimated via gradient-boosting decision trees, and variable importance examined via Shapley values. We built a gradient-boosting model using the Extreme Gradient Boosting (XGBoost) algorithm and compared its performance against 4 empirical rules commonly used for risk stratifications based on age and number of comorbidities.ResultsThe cohort included 36 323 patients with 2583 hospitalizations (7.1%). Hospitalized patients had a higher median age (64 yr v. 43 yr), were more likely to be male (56.3% v. 47.3%) and had a higher median number of comorbidities (3, interquartile range [IQR] 2-6 v. 1, IQR 0-3) than nonhospitalized patients. Patients were split into development (n = 29 058, 80.0%) and held-out validation (n = 7265, 20.0%) cohorts. The gradient-boosting model achieved high discrimination (development cohort: area under the receiver operating characteristic curve across the 5 folds of 0.852; validation cohort: 0.8475) and strong calibration (slope = 1.01, intercept = -0.01). The patients who scored at the top 10% captured 47.4% of hospitalizations, and those who scored at the top 30% captured 80.6%.InterpretationWe developed and validated an accurate risk stratification model using routinely collected health administrative data. We envision that modelling such risk stratification based on routinely collected health data could support management of COVID-19 on a population health level.

Project description:BackgroundDiabetic ketoacidosis (DKA) is the leading cause of morbidity and mortality in pediatric type 1 diabetes (T1D), occurring in approximately 20% of patients, with an economic cost of $5.1 billion/year in the United States. Despite multiple risk factors for postdiagnosis DKA, there is still a need for explainable, clinic-ready models that accurately predict DKA hospitalization in established patients with pediatric T1D.ObjectiveWe aimed to develop an interpretable machine learning model to predict the risk of postdiagnosis DKA hospitalization in children with T1D using routinely collected time-series of electronic health record (EHR) data.MethodsWe conducted a retrospective case-control study using EHR data from 1787 patients from among 3794 patients with T1D treated at a large tertiary care US pediatric health system from January 2010 to June 2018. We trained a state-of-the-art; explainable, gradient-boosted ensemble (XGBoost) of decision trees with 44 regularly collected EHR features to predict postdiagnosis DKA. We measured the model's predictive performance using the area under the receiver operating characteristic curve-weighted F1-score, weighted precision, and recall, in a 5-fold cross-validation setting. We analyzed Shapley values to interpret the learned model and gain insight into its predictions.ResultsOur model distinguished the cohort that develops DKA postdiagnosis from the one that does not (P<.001). It predicted postdiagnosis DKA risk with an area under the receiver operating characteristic curve of 0.80 (SD 0.04), a weighted F1-score of 0.78 (SD 0.04), and a weighted precision and recall of 0.83 (SD 0.03) and 0.76 (SD 0.05) respectively, using a relatively short history of data from routine clinic follow-ups post diagnosis. On analyzing Shapley values of the model output, we identified key risk factors predicting postdiagnosis DKA both at the cohort and individual levels. We observed sharp changes in postdiagnosis DKA risk with respect to 2 key features (diabetes age and glycated hemoglobin at 12 months), yielding time intervals and glycated hemoglobin cutoffs for potential intervention. By clustering model-generated Shapley values, we automatically stratified the cohort into 3 groups with 5%, 20%, and 48% risk of postdiagnosis DKA.ConclusionsWe have built an explainable, predictive, machine learning model with potential for integration into clinical workflow. The model risk-stratifies patients with pediatric T1D and identifies patients with the highest postdiagnosis DKA risk using limited follow-up data starting from the time of diagnosis. The model identifies key time points and risk factors to direct clinical interventions at both the individual and cohort levels. Further research with data from multiple hospital systems can help us assess how well our model generalizes to other populations. The clinical importance of our work is that the model can predict patients most at risk for postdiagnosis DKA and identify preventive interventions based on mitigation of individualized risk factors.

Dataset Information

Personalized stratification of hospitalization risk amidst COVID-19: A machine learning approach.

Publications

Personalized stratification of hospitalization risk amidst COVID-19: A machine learning approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets