Dataset Information

Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data.

ABSTRACT:

Background

Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inform patients about their risks of a complication caused by hypertension.

Objective

Our goal was to develop and compare machine learning models predicting high-risk vascular diseases for hypertensive patients so that they can manage their blood pressure based on their risk level.

Methods

We used a 12-year longitudinal dataset of the nationwide sample cohort, which contains the data of 514,866 patients and allows tracking of patients' medical history across all health care providers in Korea (N=51,920). To ensure the generalizability of our models, we conducted an external validation using another national sample cohort dataset, comprising one million different patients, published by the National Health Insurance Service. From each dataset, we obtained the data of 74,535 and 59,738 patients with essential hypertension and developed machine learning models for predicting cardiovascular and cerebrovascular events. Six machine learning models were developed and compared for evaluating performances based on validation metrics.

Results

Machine learning algorithms enabled us to detect high-risk patients based on their medical history. The long short-term memory-based algorithm outperformed in the within test (F1-score=.772, external test F1-score=.613), and the random forest-based algorithm of risk prediction showed better performance over other machine learning algorithms concerning generalization (within test F1-score=.757, external test F1-score=.705). Concerning the number of features, in the within test, the long short-term memory-based algorithms outperformed regardless of the number of features. However, in the external test, the random forest-based algorithm was the best, irrespective of the number of features it encountered.

Conclusions

We developed and compared machine learning models predicting high-risk vascular diseases in hypertensive patients so that they may manage their blood pressure based on their risk level. By relying on the prediction model, a government can predict high-risk patients at the nationwide level and establish health care policies in advance.

SUBMITTER: Park J

PROVIDER: S-EPMC6396076 | biostudies-literature | 2019 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data.

Park Jaram J Kim Jeong-Whun JW Ryu Borim B Heo Eunyoung E Jung Se Young SY Yoo Sooyoung S

Journal of medical Internet research 20190215 2

<h4>Background</h4>Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inf ...[more]

PMID: 30767907

Similar Datasets

Project description:OBJECTIVE:Some patients who are given opioids for pain could develop opioid use disorder. If it was possible to identify patients who are at a higher risk of opioid use disorder, then clinicians could spend more time educating these patients about the risks. We develop and validate a model to predict a person's future risk of opioid use disorder at the point before being dispensed their first opioid. METHODS:A cohort study patient-level prediction using four US claims databases with target populations ranging between 343,552 and 384,424 patients. The outcome was recorded diagnosis of opioid abuse, dependency or unspecified drug abuse as a proxy for opioid use disorder from 1 day until 365 days after the first opioid is dispensed. We trained a regularized logistic regression using candidate predictors consisting of demographics and any conditions, drugs, procedures or visits prior to the first opioid. We then selected the top predictors and created a simple 8 variable score model. RESULTS:We estimated the percentage of new users of opioids with reported opioid use disorder within a year to range between 0.04%-0.26% across US claims data. We developed an 8 variable Calculator of Risk for Opioid Use Disorder (CROUD) score, derived from the prediction models to stratify patients into higher and lower risk groups. The 8 baseline variables were age 15-29, medical history of substance abuse, mood disorder, anxiety disorder, low back pain, renal impairment, painful neuropathy and recent ER visit. 1.8% of people were in the high risk group for opioid use disorder and had a score > = 23 with the model obtaining a sensitivity of 13%, specificity of 98% and PPV of 1.14% for predicting opioid use disorder. CONCLUSIONS:CROUD could be used by clinicians to obtain personalized risk scores. CROUD could be used to further educate those at higher risk and to personalize new opioid dispensing guidelines such as urine testing. Due to the high false positive rate, it should not be used for contraindication or to restrict utilization.

Project description:BackgroundNationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed.ObjectiveWe aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments.MethodsAs source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning-based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital.ResultsThe one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively.ConclusionsOur results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.

Project description:Background: The cardiovascular and cerebrovascular risk of postoperative acute kidney injury (AKI) in surgical patients is poorly described, especially in the hypertensive population. Methods: We conducted a retrospective cohort study among all hypertensive patients who underwent elective noncardiac surgery from January 1st, 2012 to August 1st, 2017 at the Third Xiangya Hospital. The primary outcomes were fatal stroke and fatal myocardial infarction (MI). The secondary outcomes were all-cause mortality. Results: The postoperative cumulative mortality within 3 months, 6 months, 1 year, 2 years, and 5 years were 1.27, 1.48, 2.15, 2.15, and 5.36%, for fatal stroke, and 2.05, 2.27, 2.70, 3.37, and 5.61% for fatal MI, respectively, in patients with postoperative AKI. Compared with non-AKI patients, those with postoperative AKI had a significantly higher risk of fatal stroke and fatal MI within 3 months [hazard ratio (HR): 5.49 (95% CI: 1.88-16.00) and 11.82 (95% CI: 4.56-30.62), respectively], 6 months [HR: 3.58 (95% CI: 1.43-8.97) and 9.23 (95% CI: 3.89-21.90), respectively], 1 year [HR: 3.64 (95% CI: 1.63-8.10) and 5.14 (95% CI: 2.50-10.57), respectively], 2 years [HR: 2.21 (95% CI: 1.03-4.72) and 3.06 (95% CI: 1.66-5.64), respectively], and 5 years [HR: 2.27 (95% CI: 1.30-3.98) and 1.98 (95% CI: 1.16-3.20), respectively]. In subgroup analysis of perioperative blood pressure (BP) lowering administration, postoperative AKI was significantly associated with 1-year and 5-year risk of fatal stroke [HR: 9.46 (95% CI: 2.85-31.40) and 3.88 (95% CI: 1.67-9.01), respectively] in patients with ACEI/ARB, and MI [HR: 6.62 (95% CI: 2.23-19.62) and 2.44 (95% CI: 1.22-4.90), respectively] in patients with CCB. Conclusion: Hypertensive patients with postoperative AKI have a significantly higher risk of fatal stroke and fatal MI, as well as all-cause mortality, within 5 years after elective noncardiac surgery. In patients with perioperative administration of ACEI/ARB and CCB, postoperative AKI was significantly associated with higher risk of fatal stroke and MI, respectively.

Project description:BackgroundHypertension and diabetes mellitus are two of the major risk factors for cardio-cerebrovascular diseases (CVDs). Although prior studies have confirmed that the coexistence of the two can markedly increase the risk of CVDs, few studies investigated whether potential interaction effects of hypertension and diabetes can result in greater cardio-cerebrovascular damage. We aimed to investigate the prevalence of hypertension and diabetes and whether they both affect synergistically the risk of CVDs.MethodsA cross-sectional study was conducted by using a multistage stratified random sampling among communities in Changsha City, Hunan Province. Study participants aged > = 18 years were asked to complete questionnaires and physical examinations. Multivariate logistic regression models were performed to evaluate the association of diabetes, hypertension, and their multiplicative interaction with CVDs with adjustment for potential confounders. We also evaluated additive interaction with the relative excess risk ratio (RERI), attribution percentage (AP), synergy index (SI).ResultsA total of 14,422 participants aged 18-98 years were collected (men = 5827, 40.7%). The prevalence was 22.7% for hypertension, 7.0% for diabetes, and 3.8% for diabetes with hypertension complication, respectively. Older age, women, higher educational level, unmarried status, obesity (central obesity) were associated with increased risk of hypertension and diabetes. We did not find significant multiplicative interaction of diabetes and hypertension on CVDs, but observed a synergistic additive interaction on coronary heart disease (SI, 1.43; 95% CI, 1.03-1.97; RERI, 1.94; 95% CI, 0.05-3.83; AP, 0.26; 95% CI, 0.06-0.46).ConclusionsDiabetes and hypertension were found to be associated with a significantly increased risk of CVDs and a significant synergistic additive interaction of diabetes and hypertension on coronary heart disease was observed. Participants who were old, women, highly educated, unmarried, obese (central obese) had increased risk of diabetes and hypertension.

Project description:IntroductionUS claims data contain medical data on large heterogeneous populations and are excellent sources for medical research. Some claims data do not contain complete death records, limiting their use for mortality or mortality-related studies. A model to predict whether a patient died at the end of the follow-up time (referred to as the end of observation) is needed to enable mortality-related studies.ObjectiveThe objective of this study was to develop a patient-level model to predict whether the end of observation was due to death in US claims data.MethodsWe used a claims dataset with full death records, Optum© De-Identified Clinformatics® Data-Mart-Database-Date of Death mapped to the Observational Medical Outcome Partnership common data model, to develop a model that classifies the end of observations into death or non-death. A regularized logistic regression was trained using 88,514 predictors (recorded within the prior 365 or 30 days) and externally validated by applying the model to three US claims datasets.ResultsApproximately 25 in 1000 end of observations in Optum are due to death. The Discriminating End of observation into Alive and Dead (DEAD) model obtained an area under the receiver operating characteristic curve of 0.986. When defining death as a predicted risk of?>?0.5, only 2% of the end of observations were predicted to be due to death and the model obtained a sensitivity of 62% and a positive predictive value of 74.8%. The external validation showed the model was transportable, with area under the receiver operating characteristic curves ranging between 0.951 and 0.995 across the US claims databases.ConclusionsUS claims data often lack complete death records. The DEAD model can be used to impute death at various sensitivity, specificity, or positive predictive values depending on the use of the model. The DEAD model can be readily applied to any observational healthcare database mapped to the Observational Medical Outcome Partnership common data model and is available from https://github.com/OHDSI/StudyProtocolSandbox/tree/master/DeadModel .

Dataset Information

Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data.

Background

Objective

Methods

Results

Conclusions

Publications

Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets