Unknown

Dataset Information

0

Machine Learning Approaches to Predict 6-Month Mortality Among Patients With Cancer.


ABSTRACT: Importance:Machine learning algorithms could identify patients with cancer who are at risk of short-term mortality. However, it is unclear how different machine learning algorithms compare and whether they could prompt clinicians to have timely conversations about treatment and end-of-life preferences. Objectives:To develop, validate, and compare machine learning algorithms that use structured electronic health record data before a clinic visit to predict mortality among patients with cancer. Design, Setting, and Participants:Cohort study of 26 525 adult patients who had outpatient oncology or hematology/oncology encounters at a large academic cancer center and 10 affiliated community practices between February 1, 2016, and July 1, 2016. Patients were not required to receive cancer-directed treatment. Patients were observed for up to 500 days after the encounter. Data analysis took place between October 1, 2018, and September 1, 2019. Exposures:Logistic regression, gradient boosting, and random forest algorithms. Main Outcomes and Measures:Primary outcome was 180-day mortality from the index encounter; secondary outcome was 500-day mortality from the index encounter. Results:Among 26 525 patients in the analysis, 1065 (4.0%) died within 180 days of the index encounter. Among those who died, the mean age was 67.3 (95% CI, 66.5-68.0) years, and 500 (47.0%) were women. Among those who were alive at 180 days, the mean age was 61.3 (95% CI, 61.1-61.5) years, and 15 922 (62.5%) were women. The population was randomly partitioned into training (18 567 [70.0%]) and validation (7958 [30.0%]) cohorts at the patient level, and a randomly selected encounter was included in either the training or validation set. At a prespecified alert rate of 0.02, positive predictive values were higher for the random forest (51.3%) and gradient boosting (49.4%) algorithms compared with the logistic regression algorithm (44.7%). There was no significant difference in discrimination among the random forest (area under the receiver operating characteristic curve [AUC], 0.88; 95% CI, 0.86-0.89), gradient boosting (AUC, 0.87; 95% CI, 0.85-0.89), and logistic regression (AUC, 0.86; 95% CI, 0.84-0.88) models (P for comparison = .02). In the random forest model, observed 180-day mortality was 51.3% (95% CI, 43.6%-58.8%) in the high-risk group vs 3.4% (95% CI, 3.0%-3.8%) in the low-risk group; at 500 days, observed mortality was 64.4% (95% CI, 56.7%-71.4%) in the high-risk group and 7.6% (7.0%-8.2%) in the low-risk group. In a survey of 15 oncology clinicians with a 52.1% response rate, 100 of 171 patients (58.8%) who had been flagged as having high risk by the gradient boosting algorithm were deemed appropriate for a conversation about treatment and end-of-life preferences in the upcoming week. Conclusions and Relevance:In this cohort study, machine learning algorithms based on structured electronic health record data accurately identified patients with cancer at risk of short-term mortality. When the gradient boosting algorithm was applied in real time, clinicians believed that most patients who had been identified as having high risk were appropriate for a timely conversation about treatment and end-of-life preferences.

SUBMITTER: Parikh RB 

PROVIDER: S-EPMC6822091 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC10988375 | biostudies-literature
| S-EPMC9157269 | biostudies-literature
| S-EPMC7674405 | biostudies-literature
| S-EPMC8157228 | biostudies-literature
| S-EPMC8076284 | biostudies-literature
| S-EPMC7006166 | biostudies-literature
| S-EPMC9709414 | biostudies-literature
| S-EPMC10329647 | biostudies-literature
| S-EPMC6210196 | biostudies-other
| S-EPMC9052482 | biostudies-literature