Dataset Information

Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults.

ABSTRACT:

Objective

To compare Cox models, machine learning (ML), and ensemble models combining both approaches, for prediction of stroke risk in a prospective study of Chinese adults.

Materials and methods

We evaluated models for stroke risk at varying intervals of follow-up (<9 years, 0-3 years, 3-6 years, 6-9 years) in 503 842 adults without prior history of stroke recruited from 10 areas in China in 2004-2008. Inputs included sociodemographic factors, diet, medical history, physical activity, and physical measurements. We compared discrimination and calibration of Cox regression, logistic regression, support vector machines, random survival forests, gradient boosted trees (GBT), and multilayer perceptrons, benchmarking performance against the 2017 Framingham Stroke Risk Profile. We then developed an ensemble approach to identify individuals at high risk of stroke (>10% predicted 9-yr stroke risk) by selectively applying either a GBT or Cox model based on individual-level characteristics.

Results

For 9-yr stroke risk prediction, GBT provided the best discrimination (AUROC: 0.833 in men, 0.836 in women) and calibration, with consistent results in each interval of follow-up. The ensemble approach yielded incrementally higher accuracy (men: 76%, women: 80%), specificity (men: 76%, women: 81%), and positive predictive value (men: 26%, women: 24%) compared to any of the single-model approaches.

Discussion and conclusion

Among several approaches, an ensemble model combining both GBT and Cox models achieved the best performance for identifying individuals at high risk of stroke in a contemporary study of Chinese adults. The results highlight the potential value of expanding the use of ML in clinical practice.

SUBMITTER: Chun M

PROVIDER: S-EPMC8324240 | biostudies-literature | 2021 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults.

Chun Matthew M Clarke Robert R Cairns Benjamin J BJ Clifton David D Bennett Derrick D Chen Yiping Y Guo Yu Y Pei Pei P Lv Jun J Yu Canqing C Yang Ling L Li Liming L Chen Zhengming Z Zhu Tingting T

Journal of the American Medical Informatics Association : JAMIA 20210701 8

<h4>Objective</h4>To compare Cox models, machine learning (ML), and ensemble models combining both approaches, for prediction of stroke risk in a prospective study of Chinese adults.<h4>Materials and methods</h4>We evaluated models for stroke risk at varying intervals of follow-up (<9 years, 0-3 years, 3-6 years, 6-9 years) in 503 842 adults without prior history of stroke recruited from 10 areas in China in 2004-2008. Inputs included sociodemographic factors, diet, medical history, physical act ...[more]

PMID: 33969418

Similar Datasets

Project description:BackgroundSuicide is a leading cause of death in China and accounts for about one-sixth of all suicides worldwide. The objective of this study was to examine the recent distribution of suicide and risk factors for death by suicide. Identifying underlying risk factors could benefit development of evidence-based prevention and intervention programs.Methods and findingsWe conducted a prospective study, the China Kadoorie Biobank, of 512,715 individuals (41% men, mean age 52 years) from 10 (5 urban, 5 rural) areas which are diverse across China in geographic locations, social economic developmental stages, and prevalence of disease patterns. After the baseline measurements of risk factors during 2004 to 2008, participants were followed up for suicide outcomes including suicide and possible suicide deaths. Risk factors, such as sociodemographic factors and physical and mental health status, were assessed by semistructured interviews and self-report questionnaires. Suicide and possible suicide deaths were identified through linkage to the local death registries using ICD-10 codes. We conducted Cox regression to calculate hazard ratios (HRs) for suicide and for possible suicide in sensitivity analyses. During an average follow-up period of 9.9 years, 520 (101 per 100,000) people died from suicide (51.3% male), and 79.8% of them lived in rural areas. Sociodemographic factors associated with increased suicide risk were male gender (adjusted hazard ratios [aHR] = 1.6 [95% CI 1.4 to 2.0], p < 0.001), older age (1.3 [1.2 to 1.5] by each 10-yr increase, p < 0.001), rural residence (2.6 [2.1 to 3.3], p < 0.001), and single status (1.7 [1.4 to 2.2], p < 0.001). Increased hazards were found for family-related stressful life events (aHR = 1.8 [1.2 to 1.9], p < 0.001) and for major physical illnesses (1.5 [1.3 to 1.9], p < 0.001). There were strong associations of suicide with a history of lifetime mental disorders (aHR = 9.6 [5.9 to 15.6], p < 0.001) and lifetime schizophrenia-spectrum disorders (11.0 [7.1 to 17.0], p < 0.001). Links between suicide risk and depressive disorders (aHR = 2.6 [1.4 to 4.8], p = 0.002) and generalized anxiety disorders (2.6 [1.0 to 7.1], p = 0.056) in the last 12 months, and sleep disorders (1.4 [1.2 to 1.7], p < 0.001) in the past month were also found. All HRs were adjusted for sociodemographic factors including gender, age, residence, single status, education, and income. The associations with possible suicide deaths were mostly similar to those with suicide deaths, although there was no clear link between possible suicide deaths and psychiatric factors such as depression and generalized anxiety disorders. A limitation of the study is that there is likely underreporting of mental disorders due to the use of self-report information for some diagnostic categories.ConclusionsIn this study, we observed that a range of sociodemographic, lifestyle, stressful life events, physical, and mental health factors were associated with suicide in China. High-risk groups identified were elderly men in rural settings and individuals with mental disorders. These findings could form the basis of targeted approaches to reduce suicide mortality in China.

Project description:BackgroundLittle prospective evidence exists about risk factors and prognosis of acute pancreatitis in China. We examined the associations of certain metabolic and lifestyle factors with risk of acute pancreatitis in Chinese adults.Methods and findingsThe prospective China Kadoorie Biobank (CKB) recruited 512,891 adults aged 30 to 79 years from 5 urban and 5 rural areas between 25 June 2004 and 15 July 2008. During 9.2 years of follow-up (to 1 January 2015), 1,079 cases of acute pancreatitis were recorded. Cox regression was used to estimate adjusted hazard ratios (HRs) for acute pancreatitis associated with various metabolic and lifestyle factors among all or male (for smoking and alcohol drinking) participants. Overall, the mean waist circumference (WC) was 82.1 cm (SD 9.8) cm in men and 79.0 cm (SD 9.5) cm in women, 6% had diabetes, and 6% had gallbladder disease at baseline. WC was positively associated with risk of acute pancreatitis, with an adjusted HR of 1.35 (95% CI 1.27-1.43; p < 0.001) per 1-SD-higher WC. Individuals with diabetes or gallbladder disease had HRs of 1.34 (1.07-1.69; p = 0.01) and 2.42 (2.03-2.88; p < 0.001), respectively. Physical activity was inversely associated with risk of acute pancreatitis, with each 4 metabolic equivalent of task (MET) hours per day (MET-h/day) higher physical activity associated with an adjusted HR of 0.95 (0.91-0.99; p = 0.03). Compared with those without any metabolic risk factors (i.e., obesity, diabetes, gallbladder disease, and physical inactivity), the HRs of acute pancreatitis for those with 1, 2, or ≥3 risk factors were 1.61 (1.47-1.76), 2.36 (2.01-2.78), and 3.41 (2.46-4.72), respectively (p < 0.001). Among men, heavy alcohol drinkers (≥420 g/week) had an HR of 1.52 (1.11-2.09; p = 0.04, compared with abstainers), and current regular smokers had an HR of 1.45 (1.28-1.64; p = 0.02, compared with never smokers). Following a diagnosis of acute pancreatitis, there were higher risks of pancreatic cancer (HR = 8.26 [3.42-19.98]; p < 0.001; 13 pancreatic cancer cases) and death (1.53 [1.17-2.01]; p = 0.002; 89 deaths). Other diseases of the pancreas had similar risk factor profiles and prognosis to acute pancreatitis. The main study limitations are ascertainment of pancreatitis using hospital records and residual confounding.ConclusionsIn this relatively lean Chinese population, several modifiable metabolic and lifestyle factors were associated with higher risks of acute pancreatitis, and individuals with acute pancreatitis had higher risks of pancreatic cancer and death.

Project description:BackgroundThe effect of the overall diet quality on cardiometabolic diseases has been well studied in the Western population. However, evidence is still in need regarding dietary patterns depicting unique Chinese dietary habits and their associations with cardiometabolic diseases.MethodsA prospective cohort recruited around 0.5 million Chinese residents aged 30-79 years from 10 diverse survey sites during 2004-08. Dietary patterns were obtained using factor analysis based on the habitual consumption of 12 food groups collected at baseline. Among 477,465 eligible participants free of prior heart disease, stroke and cancer, linkages to multiple registries and health insurance database recorded 137,715 cardiovascular diseases (CVD) and 17,412 diabetes cases (among 451,846 non-diabetic participants) until 31 December 2017. Adjusted hazard ratios (HRs) were estimated to compare the risks of cardiometabolic diseases across quintiles of dietary pattern scores using the Cox regression.ResultsTwo dietary patterns were derived: the traditional northern pattern, characterised by wheat, other staples, egg and dairy products; and the modern pattern, featured with fresh fruit, meat, poultry, fish, dairy products and soybean. Adherence to either dietary pattern was associated with lower risks of major cardiometabolic diseases in a dose-response relationship way. After multivariate adjustment, participants adhering to the traditional northern pattern the most had an 8% (95%CI: 5-11%) lower risk of CVD in comparison with those adhering the least. Corresponding risk reductions were 12% (11-32%) for haemorrhagic stroke (HS), 14% (8-19%) for ischaemic stroke (IS), and 15% (6-24%) for diabetes, respectively. When comparing extreme quintiles of the modern pattern, the adjusted HR of HS was 0.67 (95%CI: 0.59-0.77). Corresponding HRs were 0.89 (0.86-0.92) for CVD, 0.88 (0.77-0.99) for MCE, 0.85 (0.80-0.89) for IS, and 0.89 (0.81, 0.97) for diabetes.ConclusionAmong Chinese adults, both traditional northern and modern dietary patterns were associated with lower risks of cardiovascular disease and diabetes beyond other risk factors.

Dataset Information

Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults.

Objective

Materials and methods

Results

Discussion and conclusion

Publications

Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets