Dataset Information

Applying machine learning to predict real-world individual treatment effects: insights from a virtual patient cohort.

ABSTRACT:

Objective

We aimed to investigate bias in applying machine learning to predict real-world individual treatment effects.

Materials and methods

Using a virtual patient cohort, we simulated real-world healthcare data and applied random forest and gradient boosting classifiers to develop prediction models. Treatment effect was estimated as the difference between the predicted outcomes of a treatment and a control. We evaluated the impact of predictors (ie, treatment predictors [X1], confounders [X2], treatment effects modifiers [X3], and other outcome risk factors [X4]) with known effects on treatment and outcome using real-world data, and outcome imbalance on predicting individual outcome. Using counterfactuals, we evaluated percentage of patients with biased predicted individual treatment effects.

Results

The X4 had relatively more impact on model performance than X2 and X3 did. No effects were observed from X1. Moderate-to-severe outcome imbalance had a significantly negative impact on model performance, particularly among subgroups in which an outcome occurred. Bias in predicting individual treatment effects was significant and persisted even when the models had a 100% accuracy in predicting health outcome.

Discussion

Inadequate inclusion of the X2, X3, and X4 and moderate-to-severe outcome imbalance may affect model performance in predicting individual outcome and subsequently bias in predicting individual treatment effects. Machine learning models with all features and high performance for predicting individual outcome still yielded biased individual treatment effects.

Conclusions

Direct application of machine learning might not adequately address bias in predicting individual treatment effects. Further method development is needed to advance machine learning to support individualized treatment selection.

SUBMITTER: Fang G

PROVIDER: S-EPMC7647181 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Applying machine learning to predict real-world individual treatment effects: insights from a virtual patient cohort.

Fang Gang G Annis Izabela E IE Elston-Lafata Jennifer J Cykert Samuel S

Journal of the American Medical Informatics Association : JAMIA 20191001 10

<h4>Objective</h4>We aimed to investigate bias in applying machine learning to predict real-world individual treatment effects.<h4>Materials and methods</h4>Using a virtual patient cohort, we simulated real-world healthcare data and applied random forest and gradient boosting classifiers to develop prediction models. Treatment effect was estimated as the difference between the predicted outcomes of a treatment and a control. We evaluated the impact of predictors (ie, treatment predictors [X1], c ...[more]

PMID: 31220274

Similar Datasets

Project description:BackgroundDistant metastasis of gastric cancer can seriously affect the treatment strategy of gastric cancer patients, so it is essential to identify patients at high risk of distant metastasis of gastric cancer earlier.MethodIn this study, we retrospectively collected research data from 18,472 gastric cancer patients from the SEER database. We applied six machine learning algorithms to construct a model that can predict distant metastasis of gastric cancer. We constructed the machine learning model using 10-fold cross-validation. We evaluated the model using the area under the receiver operating characteristic curves (AUC), the area under the precision-recall curve (AUPRC), decision curve analysis, and calibration curves. In addition, we used Shapley's addition interpretation (SHAP) to interpret the machine learning model. We used data from 1595 gastric cancer patients in the First Hospital of Jilin University for external validation. We plotted the correlation heat maps of the predictor variables. We selected an optimal model and constructed a web-based online calculator for predicting the risk of distant metastasis of gastric cancer.ResultThe study included 18,472 patients with gastric cancer from the SEER database, including 4,202 (22.75%) patients with distant metastases. The results of multivariate logistic regression analysis showed that age, race, grade of differentiation, tumor size, T stage, radiotherapy, and chemotherapy were independent risk factors for distant metastasis of gastric cancer. In the ten-fold cross-validation of the training set, the average AUC value of the random forest (RF) model was 0.80. The RF model performed best in the internal test set and external validation set. The RF model had an AUC of 0.80, an AUPRC of 0.555, an accuracy of 0.81, and a precision of 0.78 in the internal test set. The RF model had a metric AUC of 0.76 in the external validation set, an AUPRC of 0.496, an accuracy of 0.82, and a precision of 0.81. Finally, we constructed a network calculator for distant metastasis of gastric cancer using the RF model.ConclusionWith the help of pathological and clinical indicators, we constructed a well-performing RF model for predicting the risk of distant metastasis in gastric cancer patients to help clinicians make clinical decisions.

Project description:Background Metastasis in the lungs is common in patients with rectal cancer, and it can have severe consequences on their survival and quality of life. Therefore, it is essential to identify patients who may be at risk of developing lung metastasis from rectal cancer. Methods In this study, we utilized eight machine-learning methods to create a model for predicting the risk of lung metastasis in patients with rectal cancer. Our cohort consisted of 27,180 rectal cancer patients selected from the Surveillance, Epidemiology and End Results (SEER) database between 2010 and 2017 for model development. Additionally, we validated our models using 1118 rectal cancer patients from a Chinese hospital to evaluate model performance and generalizability. We assessed our models’ performance using various metrics, including the area under the curve (AUC), the area under the precision-recall curve (AUPR), the Matthews Correlation Coefficient (MCC), decision curve analysis (DCA), and calibration curves. Finally, we applied the best model to develop a web-based calculator for predicting the risk of lung metastasis in patients with rectal cancer. Result Our study employed tenfold cross-validation to assess the performance of eight machine-learning models for predicting the risk of lung metastasis in patients with rectal cancer. The AUC values ranged from 0.73 to 0.96 in the training set, with the extreme gradient boosting (XGB) model achieving the highest AUC value of 0.96. Moreover, the XGB model obtained the best AUPR and MCC in the training set, reaching 0.98 and 0.88, respectively. We found that the XGB model demonstrated the best predictive power, achieving an AUC of 0.87, an AUPR of 0.60, an accuracy of 0.92, and a sensitivity of 0.93 in the internal test set. Furthermore, the XGB model was evaluated in the external test set and achieved an AUC of 0.91, an AUPR of 0.63, an accuracy of 0.93, a sensitivity of 0.92, and a specificity of 0.93. The XGB model obtained the highest MCC in the internal test set and external validation set, with 0.61 and 0.68, respectively. Based on the DCA and calibration curve analysis, the XGB model had better clinical decision-making ability and predictive power than the other seven models. Lastly, we developed an online web calculator using the XGB model to assist doctors in making informed decisions and to facilitate the model’s wider adoption (https://share.streamlit.io/woshiwz/rectal_cancer/main/lung.py). Conclusion In this study, we developed an XGB model based on clinicopathological information to predict the risk of lung metastasis in patients with rectal cancer, which may help physicians make clinical decisions.

Project description:BackgroundIdentifying individuals who are unlikely to adhere to a physical exercise regime has potential to improve physical activity interventions. The aim of this paper is to develop and test adherence prediction models using objectively measured physical activity data in the Mobile Phone-Based Physical Activity Education program (mPED) trial. To the best of our knowledge, this is the first to apply Machine Learning methods to predict exercise relapse using accelerometer-recorded physical activity data.MethodsWe use logistic regression and support vector machine methods to design two versions of a Discontinuation Prediction Score (DiPS), which uses objectively measured past data (e.g., steps and goal achievement) to provide a numerical quantity indicating the likelihood of exercise relapse in the upcoming week. The respective prediction accuracy of these two versions of DiPS are compared, and then numerical simulation is performed to explore the potential of using DiPS to selectively allocate financial incentives to participants to encourage them to increase physical activity.Resultswe had access to a physical activity trial data that were continuously collected every 60 sec every day for 9 months in 210 participants. By using the first 15 weeks of data as training and test on weeks 16-30, we show that both versions of DiPS have a test AUC of 0.9 with high sensitivity and specificity in predicting the probability of exercise adherence. Simulation results assuming different intervention regimes suggest the potential benefit of using DiPS as a score to allocate resources in physical activity intervention programs in reducing costs over other allocation schemes.ConclusionsDiPS is capable of making accurate and robust predictions for future weeks. The most predictive features are steps and physical activity intensity. Furthermore, the use of DiPS scores can be a promising approach to determine when or if to provide just-in-time messages and step goal adjustments to improve compliance. Further studies on the use of DiPS in the design of physical activity promotion programs are warranted.Trial registrationClinicalTrials.gov NCT01280812 Registered on January 21, 2011.

Dataset Information

Applying machine learning to predict real-world individual treatment effects: insights from a virtual patient cohort.

Objective

Materials and methods

Results

Discussion

Conclusions

Publications

Applying machine learning to predict real-world individual treatment effects: insights from a virtual patient cohort.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets