ABSTRACT: Importance:Patients with cancer who die soon after starting chemotherapy incur costs of treatment without the benefits. Accurately predicting mortality risk before administering chemotherapy is important, but few patient data-driven tools exist. Objective:To create and validate a machine learning model that predicts mortality in a general oncology cohort starting new chemotherapy, using only data available before the first day of treatment. Design, Setting, and Participants:This retrospective cohort study of patients at a large academic cancer center from January 1, 2004, through December 31, 2014, determined date of death by linkage to Social Security data. The model was derived using data from 2004 through 2011, and performance was measured on nonoverlapping data from 2012 through 2014. The analysis was conducted from June 1 through August 1, 2017. Participants included 26?946 patients starting 51?774 new chemotherapy regimens. Main Outcomes and Measures:Thirty-day mortality from the first day of a new chemotherapy regimen. Secondary outcomes included model discrimination by predicted mortality risk decile among patients receiving palliative chemotherapy, and 180-day mortality from the first day of a new chemotherapy regimen. Results:Among the 26?946 patients included in the analysis, mean age was 58.7 years (95% CI, 58.5-58.9 years); 61.1% were female (95% CI, 60.4%-61.9%); and 86.9% were white (95% CI, 86.4%-87.4%). Thirty-day mortality from chemotherapy start was 2.1% (95% CI, 1.9%-2.4%). Among the 9114 patients in the validation set, the most common primary cancers were breast (21.1%; 95% CI, 20.2%-21.9%), colorectal (19.3%; 95% CI, 18.5%-20.2%), and lung (18.0%; 95% CI, 17.2%-18.8%). Model predictions were accurate for all patients (area under the curve [AUC], 0.940; 95% CI, 0.930-0.951). Predictions for patients starting palliative chemotherapy (46.6% of regimens; 95% CI, 45.8%-47.3%), for whom prognosis is particularly important, remained highly accurate (AUC, 0.924; 95% CI, 0.910-0.939). To illustrate model discrimination, patients were ranked initiating palliative chemotherapy by model-predicted mortality risk, and observed mortality was calculated by risk decile. Thirty-day mortality in the highest-risk decile was 22.6% (95% CI, 19.6%-25.6%); in the lowest-risk decile, no patients died. Predictions remained accurate across all primary cancers, stages, and chemotherapies, even for clinical trial regimens that first appeared in years after the model was trained (AUC, 0.942; 95% CI, 0.882-1.000). The same model also performed well for prediction of 180-day mortality (AUC for all patients, 0.870 [95% CI, 0.862-0.877]; highest- vs lowest-risk decile mortality, 74.8% [95% CI, 72.7%-77.0%] vs 0.2% [95% CI, 0.01%-0.4%]). Predictions were more accurate than estimates from randomized clinical trials of individual chemotherapies or the Surveillance, Epidemiology, and End Results data set. Conclusions and Relevance:A machine learning algorithm using electronic health record data accurately predicted short-term mortality among patients starting chemotherapy. Further research is necessary to determine the generalizability and feasibility of applying this algorithm in clinical settings.