Dataset Information

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.

ABSTRACT: This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets. We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the international classification of diseases system (I20-I25). In addition, 4699 random subjects without CVD diagnosis were enrolled as a non-CVD group. Both groups were matched by age and gender. Various ML algorithms were applied to perform CVD prediction; then, the performances of all the prediction models were compared. The extreme gradient boosting, gradient boosting, and random forest algorithms exhibited the best average prediction accuracy (area under receiver operating characteristic curve (AUROC): 0.812, 0.812, and 0.811, respectively) among all algorithms validated in this study. Based on AUROC, the ML algorithms improved the CVD prediction performance, compared to previously proposed prediction models. Preexisting CVD history was the most important factor contributing to the accuracy of the prediction model, followed by total cholesterol, low-density lipoprotein cholesterol, waist-height ratio, and body mass index. Our results indicate that the proposed health screening dataset-based CVD prediction model using ML algorithms is readily applicable, produces validated results and outperforms the previous CVD prediction models.

SUBMITTER: Kim JOR

PROVIDER: S-EPMC8229422 | biostudies-literature | 2021 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.

Kim Joung Ouk Ryan JOR Jeong Yong-Suk YS Kim Jin Ho JH Lee Jong-Weon JW Park Dougho D Kim Hyoung-Seop HS

Diagnostics (Basel, Switzerland) 20210525 6

<h4>Background</h4>This study proposes a cardiovascular diseases (CVD) prediction model using machine learning (ML) algorithms based on the National Health Insurance Service-Health Screening datasets.<h4>Methods</h4>We extracted 4699 patients aged over 45 as the CVD group, diagnosed according to the international classification of diseases system (I20-I25). In addition, 4699 random subjects without CVD diagnosis were enrolled as a non-CVD group. Both groups were matched by age and gender. Variou ...[more]

PMID: 34070504

Similar Datasets

Project description:ObjectivesDementia is common in people over the age of 65 years, with 80% of people with dementia older than 75 years. Previous studies have linked dementia to late-life depression, but the association between dementia and mid-life depression is poorly understood. Depression is a preventable and treatable medical condition, which means it is a modifiable factor that can potentially prevent or delay dementia. This study aimed to identify the association between dementia and depression within the life course.DesignA nationwide, retrospective propensity score matched cohort study associating dementia with depression. Depression diagnosed between the ages of 45 and 64 years was classified as 'mid-life' and 'late-life' if diagnosed at 65 years or older. Patients were considered to have depression when one or more International Statistical Classification of Diseases and Related Health Problems, 10th revision codes for depression were recorded as primary or secondary diagnosis.SettingNational Health Insurance Service-National Sample Cohort database of the National Health Insurance Service in South Korea, containing patient data from 2002 to 2013.ParticipantsThe study included 1824 and 374 852 patients in the case and control groups, respectively. A logistic regression analysis with complex sampling design was performed after adjusting for covariates, using the propensity score matching method without callipers, with a 1:1 nearest neighbour matching algorithm.Primary and secondary outcome measuresThe association of mid-onset and late-onset depression with dementia in terms of sociodemographic characteristics, such as sex and age, within the Korean population.ResultsDementia was significantly associated with the presence of depression (OR=2.20, 95% CI=1.53-3.14); in particular, female patients with depression and patients aged 45-64 years with depression had increased odds of dementia (OR=2.65, 95% CI=1.78-3.93 and OR=2.72, 95% CI=1.41-5.24, respectively) CONCLUSION: Depression is an associated factor for dementia, especially among people aged 45-64 years (mid-life).

Project description:ObjectiveHealth behaviour is one of the major determinants of cardiovascular diseases in working population. This study was tried to investigate the trend of cardiovascular health level, the relationship between continuous health behaviours, and changes in the risk of cardiovascular diseases of male workers by using a nationwide database.DesignThis study is a retrospective cohort study.Setting and participantsThe study analysed data of 57 837 male workers whose personal health examination data were continuously traced using Korea's National Health Insurance Service-National Sample Cohort 2.0 database.Primary outcome measuresA 10-year trend for all cardiovascular risks and change for the risks according to the consistent performance of healthy behaviours.ResultsThe results showed that the risk of being overweight (adjusted OR (aOR) 1.63, 95% CI 1.59 to 1.68) and obese (aOR 1.51, 95% CI 1.47 to 1.56) increased. The index of cardiovascular risk also increased for high fasting glucose (aOR 1.77, 95% CI 1.62 to 1.95) and high total cholesterol (aOR 1.68, 95% CI 1.60 to 1.76), respectively. The risks of high fasting glucose (aOR 2.09, 95% CI 1.40 to 3.13), high triglycerides (aOR 1.27, 95% CI 1.14 to 1.42) and high low-density lipoprotein cholesterol (aOR 1.38, 95% CI 1.14 to 1.66) were increased among high-risk smokers. Similarly, the risk of high total cholesterol (aOR 2.20, 95% CI 1.35 to 3.58) and high triglycerides (aOR 1.42, 95% CI 1.09 to 1.85) were increased among high-risk drinkers. In addition, the increase in the risk of being overweight (aOR 2.20, 95% CI 1.83 to 2.65) and obese (aOR 1.90, 95% CI 1.59 to 2.27) were analysed among who had not consistently exercised.ConclusionsSince the pattern of change in the level of cardiovascular risk related to the continuous health behaviours of male workers was identified, the findings of the present study can be used as basic data to develop health promotion policies for the population.

Dataset Information

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.

Publications

Machine Learning-Based Cardiovascular Disease Prediction Model: A Cohort Study on the Korean National Health Insurance Service Health Screening Database.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets