Dataset Information

Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke.

ABSTRACT:

Background

Better phenotyping of routinely collected coded data would be useful for research and health improvement. For example, the precision of coded data for hemorrhagic stroke (intracerebral hemorrhage [ICH] and subarachnoid hemorrhage [SAH]) may be as poor as < 50%. This work aimed to investigate the feasibility and added value of automated methods applied to clinical radiology reports to improve stroke subtyping.

Methods

From a sub-population of 17,249 Scottish UK Biobank participants, we ascertained those with an incident stroke code in hospital, death record or primary care administrative data by September 2015, and ≥ 1 clinical brain scan report. We used a combination of natural language processing and clinical knowledge inference on brain scan reports to assign a stroke subtype (ischemic vs ICH vs SAH) for each participant and assessed performance by precision and recall at entity and patient levels.

Results

Of 225 participants with an incident stroke code, 207 had a relevant brain scan report and were included in this study. Entity level precision and recall ranged from 78 to 100%. Automated methods showed precision and recall at patient level that were very good for ICH (both 89%), good for SAH (both 82%), but, as expected, lower for ischemic stroke (73%, and 64%, respectively), suggesting coded data remains the preferred method for identifying the latter stroke subtype.

Conclusions

Our automated method applied to radiology reports provides a feasible, scalable and accurate solution to improve disease subtyping when used in conjunction with administrative coded health data. Future research should validate these findings in a different population setting.

SUBMITTER: Rannikmae K

PROVIDER: S-EPMC8204419 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:BackgroundIdentifying people at risk of cardiovascular diseases (CVD) is a cornerstone of preventative cardiology. Risk prediction models currently recommended by clinical guidelines are typically based on a limited number of predictors with sub-optimal performance across all patient groups. Data-driven techniques based on machine learning (ML) might improve the performance of risk predictions by agnostically discovering novel risk predictors and learning the complex interactions between them. We tested (1) whether ML techniques based on a state-of-the-art automated ML framework (AutoPrognosis) could improve CVD risk prediction compared to traditional approaches, and (2) whether considering non-traditional variables could increase the accuracy of CVD risk predictions.Methods and findingsUsing data on 423,604 participants without CVD at baseline in UK Biobank, we developed a ML-based model for predicting CVD risk based on 473 available variables. Our ML-based model was derived using AutoPrognosis, an algorithmic tool that automatically selects and tunes ensembles of ML modeling pipelines (comprising data imputation, feature processing, classification and calibration algorithms). We compared our model with a well-established risk prediction algorithm based on conventional CVD risk factors (Framingham score), a Cox proportional hazards (PH) model based on familiar risk factors (i.e, age, gender, smoking status, systolic blood pressure, history of diabetes, reception of treatments for hypertension and body mass index), and a Cox PH model based on all of the 473 available variables. Predictive performances were assessed using area under the receiver operating characteristic curve (AUC-ROC). Overall, our AutoPrognosis model improved risk prediction (AUC-ROC: 0.774, 95% CI: 0.768-0.780) compared to Framingham score (AUC-ROC: 0.724, 95% CI: 0.720-0.728, p < 0.001), Cox PH model with conventional risk factors (AUC-ROC: 0.734, 95% CI: 0.729-0.739, p < 0.001), and Cox PH model with all UK Biobank variables (AUC-ROC: 0.758, 95% CI: 0.753-0.763, p < 0.001). Out of 4,801 CVD cases recorded within 5 years of baseline, AutoPrognosis was able to correctly predict 368 more cases compared to the Framingham score. Our AutoPrognosis model included predictors that are not usually considered in existing risk prediction models, such as the individuals' usual walking pace and their self-reported overall health rating. Furthermore, our model improved risk prediction in potentially relevant sub-populations, such as in individuals with history of diabetes. We also highlight the relative benefits accrued from including more information into a predictive model (information gain) as compared to the benefits of using more complex models (modeling gain).ConclusionsOur AutoPrognosis model improves the accuracy of CVD risk prediction in the UK Biobank population. This approach performs well in traditionally poorly served patient subgroups. Additionally, AutoPrognosis uncovered novel predictors for CVD disease that may now be tested in prospective studies. We found that the "information gain" achieved by considering more risk factors in the predictive model was significantly higher than the "modeling gain" achieved by adopting complex predictive models.

Project description:BackgroundPrevious studies have revealed the involvement of coffee and tea in the development of stroke and dementia. However, little is known about the association between the combination of coffee and tea and the risk of stroke, dementia, and poststroke dementia. Therefore, we aimed to investigate the associations of coffee and tea separately and in combination with the risk of developing stroke and dementia.Methods and findingsThis prospective cohort study included 365,682 participants (50 to 74 years old) from the UK Biobank. Participants joined the study from 2006 to 2010 and were followed up until 2020. We used Cox proportional hazards models to estimate the associations between coffee/tea consumption and incident stroke and dementia, adjusting for sex, age, ethnicity, qualification, income, body mass index (BMI), physical activity, alcohol status, smoking status, diet pattern, consumption of sugar-sweetened beverages, high-density lipoprotein (HDL), low-density lipoprotein (LDL), history of cancer, history of diabetes, history of cardiovascular arterial disease (CAD), and hypertension. Coffee and tea consumption was assessed at baseline. During a median follow-up of 11.4 years for new onset disease, 5,079 participants developed dementia, and 10,053 participants developed stroke. The associations of coffee and tea with stroke and dementia were nonlinear (P for nonlinear <0.01), and coffee intake of 2 to 3 cups/d or tea intake of 3 to 5 cups/d or their combination intake of 4 to 6 cups/d were linked with the lowest hazard ratio (HR) of incident stroke and dementia. Compared with those who did not drink tea and coffee, drinking 2 to 3 cups of coffee and 2 to 3 cups of tea per day was associated with a 32% (HR 0.68, 95% CI, 0.59 to 0.79; P < 0.001) lower risk of stroke and a 28% (HR, 0.72, 95% CI, 0.59 to 0.89; P = 0.002) lower risk of dementia. Moreover, the combination of coffee and tea consumption was associated with lower risk of ischemic stroke and vascular dementia. Additionally, the combination of tea and coffee was associated with a lower risk of poststroke dementia, with the lowest risk of incident poststroke dementia at a daily consumption level of 3 to 6 cups of coffee and tea (HR, 0.52, 95% CI, 0.32 to 0.83; P = 0.007). The main limitations were that coffee and tea intake was self-reported at baseline and may not reflect long-term consumption patterns, unmeasured confounders in observational studies may result in biased effect estimates, and UK Biobank participants are not representative of the whole United Kingdom population.ConclusionsWe found that drinking coffee and tea separately or in combination were associated with lower risk of stroke and dementia. Intake of coffee alone or in combination with tea was associated with lower risk of poststroke dementia.

Project description:PurposeThe retina provides biomarkers of neuronal and vascular health that offer promising insights into cognitive ageing, mild cognitive impairment and dementia. This article described the rationale and methodology of eye and vision assessments with the aim of supporting the study of dementia in the UK Biobank Repeat Imaging study.ParticipantsUK Biobank is a large-scale, multicentre, prospective cohort containing in-depth genetic, lifestyle, environmental and health information from half a million participants aged 40-69 enrolled in 2006-2010 across the UK. A subset (up to 60 000 participants) of the cohort will be invited to the UK Biobank Repeat Imaging Study to collect repeated brain, cardiac and abdominal MRI scans, whole-body dual-energy X-ray absorptiometry, carotid ultrasound, as well as retinal optical coherence tomography (OCT) and colour fundus photographs.Findings to dateUK Biobank has helped make significant advances in understanding risk factors for many common diseases, including for dementia and cognitive decline. Ophthalmic genetic and epidemiology studies have also benefited from the unparalleled combination of very large numbers of participants, deep phenotyping and longitudinal follow-up of the cohort, with comprehensive health data linkage to disease outcomes. In addition, we have used UK Biobank data to describe the relationship between retinal structures, cognitive function and brain MRI-derived phenotypes.Future plansThe collection of eye-related data (eg, OCT), as part of the UK Biobank Repeat Imaging study, will take place in 2022-2028. The depth and breadth and longitudinal nature of this dataset, coupled with its open-access policy, will create a major new resource for dementia diagnostic discovery and to better understand its association with comorbid diseases. In addition, the broad and diverse data available in this study will support research into ophthalmic diseases and various other health outcomes beyond dementia.

Project description:ImportanceAn increasing body of evidence indicates an association between consuming sugar or its alternatives and cardiometabolic diseases. However, the effects of the consumption of sugar-sweetened beverages, artificially sweetened beverages, and natural juices on kidney health remain unclear.ObjectiveTo investigate the association of the intake of sugar-sweetened beverages, artificially sweetened beverages, and natural juices with the risk of chronic kidney disease (CKD), and the effect of substituting these beverage types for one another on this association.Design, setting, and participantsThis prospective, population-based cohort study analyzed data from the UK Biobank. Participants without a history of CKD who completed at least 1 dietary questionnaire were included. The follow-up period was from the date of the last dietary questionnaire until October 31, 2022, in England; July 31, 2021, in Scotland; and February 28, 2018, in Wales. Data were analyzed from May 1 to August 1, 2023.ExposuresConsumption of sugar-sweetened beverages, artificially sweetened beverages, and natural juices.Main outcomes and measuresThe primary outcome was incident CKD. Multivariable Cox proportional hazards models were used to estimate the associations between the 3 beverage types and incident CKD. A substitution analysis was used to evaluate the effect on the associations of substituting one beverage type for another.ResultsA total of 127 830 participants (mean [SD] age, 55.2 [8.0] years; 66 180 female [51.8%]) were included in the primary analysis. During a median (IQR) follow-up of 10.5 (10.4-11.2) years, 4459 (3.5%) cases of incident CKD occurred. The consumption of more than 1 serving per day of sugar-sweetened beverages was associated with higher risk of incident CKD (adjusted hazard ratio [AHR], 1.19 [95% CI, 1.05-1.34]) compared with not consuming sugar-sweetened beverages. The AHR for participants consuming more than 0 to 1 serving per day of artificially sweetened beverages was 1.10 (95% CI, 1.01-1.20) and for consuming more than 1 serving per day was 1.26 (95% CI, 1.12-1.43) compared with consuming no artificially sweetened beverages. By contrast, there was no significant association between natural juice intake and incident CKD (eg, for >1 serving per day: HR, 0.99 [95% CI, 0.87-1.11]; P = .10). Substituting sugar-sweetened beverages with artificially sweetened beverages did not show any significant difference in the risk of CKD (HR, 1.03 [95% CI, 0.96-1.10]). Conversely, replacing 1 serving per day of sugar-sweetened beverage with natural juice (HR, 0.93 [95% CI, 0.87-0.97]) or water (HR, 0.93 [95% CI, 0.88-0.99]) or replacing 1 serving per day of artificially sweetened beverage with natural juice (HR, 0.90 [95% CI, 0.84-0.96]) or water (HR, 0.91 [95% CI, 0.86-0.96]) was associated with a reduced risk of incident CKD.Conclusions and relevanceFindings from this cohort study suggest that lower consumption of sugar-sweetened beverages or artificially sweetened beverages may reduce the risk of developing CKD.