Dataset Information

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.

ABSTRACT:

Background

Identifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them. We tested (1) whether ML techniques based on a state-of-the-art automated ML framework (AutoPrognosis) could improve CVD risk prediction compared to traditional approaches, and (2) whether considering non-traditional variables could increase the accuracy of CVD risk predictions.

Methods and findings

Using data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms). We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables. Predictive performances were assessed using area under the receiver operating characteristic curve (AUC-ROC). Overall, our AutoPrognosis model improved risk prediction (AUC-ROC: 0.774, 95% CI: 0.768-0.780) compared to Framingham score (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001), Cox PH model with conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739, p < 0.001), and Cox PH model with all UK Biobank variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763, p < 0.001). Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals' usual walking pace and their self-reported overall health rating. Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative benefits accrued from including more information into a predictive model (information gain) as compared to the benefits of using more complex models (modeling gain).

Conclusions

Our AutoPrognosis model improves the accuracy of CVD risk prediction in the UK Biobank population. This approach performs well in traditionally poorly served patient subgroups. Additionally, AutoPrognosis uncovered novel predictors for CVD disease that may now be tested in prospective studies. We found that the "information gain" achieved by considering more risk factors in the predictive model was significantly higher than the "modeling gain" achieved by adopting complex predictive models.

SUBMITTER: Alaa AM

PROVIDER: S-EPMC6519796 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.

Alaa Ahmed M AM Bolton Thomas T Di Angelantonio Emanuele E Rudd James H F JHF van der Schaar Mihaela M

PloS one 20190515 5

<h4>Background</h4>Identifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions betwee ...[more]

PMID: 31091238

Similar Datasets

Project description:BackgroundRecent studies have reported that the associations between dietary carbohydrates and cardiovascular disease (CVD) may depend on the quality, rather than the quantity, of carbohydrates consumed. This study aimed to assess the associations between types and sources of dietary carbohydrates and CVD incidence. A secondary aim was to examine the associations of carbohydrate intakes with triglycerides within lipoprotein subclasses.MethodsA total of 110,497 UK Biobank participants with ≥ two (maximum five) 24-h dietary assessments who were free from CVD and diabetes at baseline were included. Multivariable-adjusted Cox regressions were used to estimate risks of incident total CVD (4188 cases), ischaemic heart disease (IHD; 3138) and stroke (1124) by carbohydrate intakes over a median follow-up time of 9.4 years, and the effect of modelled dietary substitutions. The associations of carbohydrate intakes with plasma triglycerides within lipoprotein subclasses as measured by nuclear magnetic resonance (NMR) spectroscopy were examined in 26,095 participants with baseline NMR spectroscopy measurements.ResultsTotal carbohydrate intake was not associated with CVD outcomes. Free sugar intake was positively associated with total CVD (HR; 95% CI per 5% of energy, 1.07;1.03-1.10), IHD (1.06;1.02-1.10), and stroke (1.10;1.04-1.17). Fibre intake was inversely associated with total CVD (HR; 95% CI per 5 g/d, 0.96;0.93-0.99). Modelled isoenergetic substitution of 5% of energy from refined grain starch with wholegrain starch was inversely associated with total CVD (0.94;0.91-0.98) and IHD (0.94;0.90-0.98), and substitution of free sugars with non-free sugars was inversely associated with total CVD (0.95;0.92-0.98) and stroke (0.91;0.86-0.97). Free sugar intake was positively associated with triglycerides within all lipoproteins.ConclusionsHigher free sugar intake was associated with higher CVD incidence and higher triglyceride concentrations within all lipoproteins. Higher fibre intake and replacement of refined grain starch and free sugars with wholegrain starch and non-free sugars, respectively, may be protective for incident CVD.

Project description:BackgroundThe atherosclerotic cardiovascular disease (ASCVD) is associated with dementia. However, the risk factors of dementia in patients with ASCVD remain unclear, necessitating the development of accurate prediction models.ObjectiveThe aim of the study is to develop a machine learning model for use in patients with ASCVD to predict dementia risk using available clinical and sociodemographic data.MethodsThis prognostic study included patients with ASCVD between 2006 and 2010, with registration of follow-up data ending on April 2023 based on the UK Biobank. We implemented a data-driven strategy, identifying predictors from 316 variables and developing a machine learning model to predict the risk of incident dementia, Alzheimer disease, and vascular dementia within 5, 10, and longer-term follow-up in patients with ASCVD.ResultsA total of 29,561 patients with ASCVD were included, and 1334 (4.51%) developed dementia during a median follow-up time of 10.3 (IQR 7.6-12.4) years. The best prediction model (UK Biobank ASCVD risk prediction model) was light gradient boosting machine, comprising 10 predictors including age, time to complete pairs matching tasks, mean time to correctly identify matches, mean sphered cell volume, glucose levels, forced expiratory volume in 1 second z score, C-reactive protein, forced vital capacity, time engaging in activities, and age first had sexual intercourse. This model achieved the following performance metrics for all incident dementia: area under the receiver operating characteristic curve: mean 0.866 (SD 0.027), accuracy: mean 0.883 (SD 0.010), sensitivity: mean 0.637 (SD 0.084), specificity: mean 0.914 (SD 0.012), precision: mean 0.479 (SD 0.031), and F1-score: mean 0.546 (SD 0.043). Meanwhile, this model was well-calibrated (Kolmogorov-Smirnov test showed goodness-of-fit P value>.99) and maintained robust performance across different temporal cohorts. Besides, the model had a beneficial potential in clinical practice with a decision curve analysis.ConclusionsThe findings of this study suggest that predictive modeling could inform patients and clinicians about ASCVD at risk for dementia.

Project description:Non-alcoholic fatty liver disease (NAFLD) has emerged as the most prevalent chronic liver disease worldwide, yet detection has remained largely based on surrogate serum biomarkers, elastography or biopsy. In this study, we used a total of 2959 participants from the UK biobank cohort and established the association of dual-energy X-ray absorptiometry (DXA)-derived body composition parameters and leveraged machine learning models to predict NAFLD. Hepatic steatosis reference was based on MRI-PDFF which has been extensively validated previously. We found several significant associations with traditional measurements such as abdominal obesity, as defined by waist-to-hip ratio (OR = 2.50 (male), 3.35 (female)), android-gynoid ratio (OR = 3.35 (male), 6.39 (female)) and waist circumference (OR = 1.79 (male), 3.80 (female)) with hepatic steatosis. Similarly, A Body Shape Index (Quantile 4 OR = 1.89 (male), 5.81 (female)), and for fat mass index, both overweight (OR = 6.93 (male), 2.83 (female)) and obese (OR = 14.12 (male), 5.32 (female)) categories were likewise significantly associated with hepatic steatosis. DXA parameters were shown to be highly associated such as visceral adipose tissue mass (OR = 8.37 (male), 19.03 (female)), trunk fat mass (OR = 8.64 (male), 25.69 (female)) and android fat mass (OR = 7.93 (male), 21.77 (female)) with NAFLD. We trained machine learning classifiers with logistic regression and two histogram-based gradient boosting ensembles for the prediction of hepatic steatosis using traditional body composition indices and DXA parameters which achieved reasonable performance (AUC = 0.83-0.87). Based on SHapley Additive exPlanations (SHAP) analysis, DXA parameters that had the largest contribution to the classifiers were the features predicted with significant association with NAFLD. Overall, this study underscores the potential utility of DXA as a practical and potentially opportunistic method for the screening of hepatic steatosis.

Dataset Information

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.

Background

Methods and findings

Conclusions

Publications

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets