Dataset Information

Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation.

ABSTRACT:

Background

Nationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed.

Objective

We aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments.

Methods

As source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning-based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital.

Results

The one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively.

Conclusions

Our results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.

SUBMITTER: Lee E

PROVIDER: S-EPMC8438609 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:OBJECTIVE:Some patients who are given opioids for pain could develop opioid use disorder. If it was possible to identify patients who are at a higher risk of opioid use disorder, then clinicians could spend more time educating these patients about the risks. We develop and validate a model to predict a person's future risk of opioid use disorder at the point before being dispensed their first opioid. METHODS:A cohort study patient-level prediction using four US claims databases with target populations ranging between 343,552 and 384,424 patients. The outcome was recorded diagnosis of opioid abuse, dependency or unspecified drug abuse as a proxy for opioid use disorder from 1 day until 365 days after the first opioid is dispensed. We trained a regularized logistic regression using candidate predictors consisting of demographics and any conditions, drugs, procedures or visits prior to the first opioid. We then selected the top predictors and created a simple 8 variable score model. RESULTS:We estimated the percentage of new users of opioids with reported opioid use disorder within a year to range between 0.04%-0.26% across US claims data. We developed an 8 variable Calculator of Risk for Opioid Use Disorder (CROUD) score, derived from the prediction models to stratify patients into higher and lower risk groups. The 8 baseline variables were age 15-29, medical history of substance abuse, mood disorder, anxiety disorder, low back pain, renal impairment, painful neuropathy and recent ER visit. 1.8% of people were in the high risk group for opioid use disorder and had a score > = 23 with the model obtaining a sensitivity of 13%, specificity of 98% and PPV of 1.14% for predicting opioid use disorder. CONCLUSIONS:CROUD could be used by clinicians to obtain personalized risk scores. CROUD could be used to further educate those at higher risk and to personalize new opioid dispensing guidelines such as urine testing. Due to the high false positive rate, it should not be used for contraindication or to restrict utilization.

Project description:Because existing risk prediction models for lung cancer were developed in white populations, they may not be appropriate for predicting risk among African-Americans. Therefore, a need exists to construct and validate a risk prediction model for lung cancer that is specific to African-Americans. We analyzed data from 491 African-Americans with lung cancer and 497 matched African-American controls to identify specific risks and incorporate them into a multivariable risk model for lung cancer and estimate the 5-year absolute risk of lung cancer. We performed internal and external validations of the risk model using data on additional cases and controls from the same ongoing multiracial/ethnic lung cancer case-control study from which the model-building data were obtained as well as data from two different lung cancer studies in metropolitan Detroit, respectively. We also compared our African-American model with our previously developed risk prediction model for whites. The final risk model included smoking-related variables [smoking status, pack-years smoked, age at smoking cessation (former smokers), and number of years since smoking cessation (former smokers)], self-reported physician diagnoses of chronic obstructive pulmonary disease or hay fever, and exposures to asbestos or wood dusts. Our risk prediction model for African-Americans exhibited good discrimination [75% (95% confidence interval, 0.67-0.82)] for our internal data and moderate discrimination [63% (95% confidence interval, 0.57-0.69)] for the external data group, which is an improvement over the Spitz model for white subjects. Existing lung cancer prediction models may not be appropriate for predicting risk for African-Americans because (a) they were developed using white populations, (b) level of risk is different for risk factors that African-American share with whites, and (c) unique group-specific risk factors exist for African-Americans. This study developed and validated a risk prediction model for lung cancer that is specific to African-Americans and thus more precise in predicting their risks. These findings highlight the importance of conducting further ethnic-specific analyses of disease risk.

Project description:BackgroundCurrent studies on the establishment of prognostic models for colon cancer with lung metastasis (CCLM) were lacking. This study aimed to construct and validate prediction models of overall survival (OS) and cancer-specific survival (CSS) probability in CCLM patients.MethodData on 1,284 patients with CCLM were collected from the Surveillance, Epidemiology, and End Results (SEER) database. Patients were randomly assigned with 7:3 (stratified by survival time) to a development set and a validation set on the basis of computer-calculated random numbers. After screening the predictors by the least absolute shrinkage and selection operator (LASSO) and multivariate Cox regression, the suitable predictors were entered into Cox proportional hazard models to build prediction models. Calibration curves, concordance index (C-index), time-dependent receiver operating characteristic (ROC) curves, and decision curve analysis (DCA) were used to perform the validation of models. Based on model-predicted risk scores, patients were divided into low-risk and high-risk groups. The Kaplan-Meier (K-M) plots and log-rank test were applied to perform survival analysis between the two groups.ResultsBuilding upon the LASSO and multivariate Cox regression, six variables were significantly associated with OS and CSS (i.e., tumor grade, AJCC T stage, AJCC N stage, chemotherapy, CEA, liver metastasis). In development, validation, and expanded testing sets, AUCs and C-indexes of the OS and CSS prediction models were all greater than or near 0.7, which indicated excellent predictability of models. On the whole, the calibration curves coincided with the diagonal in two models. DCA indicated that the models had higher clinical benefit than any single risk factor. Survival analysis results showed that the prognosis was worse in the high-risk group than in the low-risk group, which suggested that the models had significant discrimination for patients with different prognoses.ConclusionAfter verification, our prediction models of CCLM are reliable and can predict the OS and CSS of CCLM patients in the next 1, 3, and 5 years, providing valuable guidance for clinical prognosis estimation and individualized administration of patients with CCLM.

Project description:BackgroundPrognostic models that are accurate could help aid medical decision making. Large observational databases often contain temporal medical data for large and diverse populations of patients. It may be possible to learn prognostic models using the large observational data. Often the performance of a prognostic model undesirably worsens when transported to a different database (or into a clinical setting). In this study we investigate different ensemble approaches that combine prognostic models independently developed using different databases (a simple federated learning approach) to determine whether ensembles that combine models developed across databases can improve model transportability (perform better in new data than single database models)?MethodsFor a given prediction question we independently trained five single database models each using a different observational healthcare database. We then developed and investigated numerous ensemble models (fusion, stacking and mixture of experts) that combined the different database models. Performance of each model was investigated via discrimination and calibration using a leave one dataset out technique, i.e., hold out one database to use for validation and use the remaining four datasets for model development. The internal validation of a model developed using the hold out database was calculated and presented as the 'internal benchmark' for comparison.ResultsIn this study the fusion ensembles generally outperformed the single database models when transported to a previously unseen database and the performances were more consistent across unseen databases. Stacking ensembles performed poorly in terms of discrimination when the labels in the unseen database were limited. Calibration was consistently poor when both ensembles and single database models were applied to previously unseen databases.ConclusionA simple federated learning approach that implements ensemble techniques to combine models independently developed across different databases for the same prediction question may improve the discriminative performance in new data (new database or clinical setting) but will need to be recalibrated using the new data. This could help medical decision making by improving prognostic model performance.