Dataset Information

Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care.

ABSTRACT: Familial hypercholesterolaemia (FH) is a common inherited disorder, causing lifelong elevated low-density lipoprotein cholesterol (LDL-C). Most individuals with FH remain undiagnosed, precluding opportunities to prevent premature heart disease and death. Some machine-learning approaches improve detection of FH in electronic health records, though clinical impact is under-explored. We assessed performance of an array of machine-learning approaches for enhancing detection of FH, and their clinical utility, within a large primary care population. A retrospective cohort study was done using routine primary care clinical records of 4,027,775 individuals from the United Kingdom with total cholesterol measured from 1 January 1999 to 25 June 2019. Predictive accuracy of five common machine-learning algorithms (logistic regression, random forest, gradient boosting machines, neural networks and ensemble learning) were assessed for detecting FH. Predictive accuracy was assessed by area under the receiver operating curves (AUC) and expected vs observed calibration slope; with clinical utility assessed by expected case-review workload and likelihood ratios. There were 7928 incident diagnoses of FH. In addition to known clinical features of FH (raised total cholesterol or LDL-C and family history of premature coronary heart disease), machine-learning (ML) algorithms identified features such as raised triglycerides which reduced the likelihood of FH. Apart from logistic regression (AUC, 0.81), all four other ML approaches had similarly high predictive accuracy (AUC?>?0.89). Calibration slope ranged from 0.997 for gradient boosting machines to 1.857 for logistic regression. Among those screened, high probability cases requiring clinical review varied from 0.73% using ensemble learning to 10.16% using deep learning, but with positive predictive values of 15.5% and 2.8% respectively. Ensemble learning exhibited a dominant positive likelihood ratio (45.5) compared to all other ML models (7.0-14.4). Machine-learning models show similar high accuracy in detecting FH, offering opportunities to increase diagnosis. However, the clinical case-finding workload required for yield of cases will differ substantially between models.

SUBMITTER: Akyea RK

PROVIDER: S-EPMC7603302 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care.

Akyea Ralph K RK Qureshi Nadeem N Kai Joe J Weng Stephen F SF

NPJ digital medicine 20201030

Familial hypercholesterolaemia (FH) is a common inherited disorder, causing lifelong elevated low-density lipoprotein cholesterol (LDL-C). Most individuals with FH remain undiagnosed, precluding opportunities to prevent premature heart disease and death. Some machine-learning approaches improve detection of FH in electronic health records, though clinical impact is under-explored. We assessed performance of an array of machine-learning approaches for enhancing detection of FH, and their clinical ...[more]

PMID: 33145438

Similar Datasets

Project description:BackgroundCardiovascular outcomes for people with familial hypercholesterolaemia can be improved with diagnosis and medical management. However, 90% of individuals with familial hypercholesterolaemia remain undiagnosed in the USA. We aimed to accelerate early diagnosis and timely intervention for more than 1·3 million undiagnosed individuals with familial hypercholesterolaemia at high risk for early heart attacks and strokes by applying machine learning to large health-care encounter datasets.MethodsWe trained the FIND FH machine learning model using deidentified health-care encounter data, including procedure and diagnostic codes, prescriptions, and laboratory findings, from 939 clinically diagnosed individuals with familial hypercholesterolaemia (395 of whom had a molecular diagnosis) and 83 136 individuals presumed free of familial hypercholesterolaemia, sampled from four US institutions. The model was then applied to a national health-care encounter database (170 million individuals) and an integrated health-care delivery system dataset (174 000 individuals). Individuals used in model training and those evaluated by the model were required to have at least one cardiovascular disease risk factor (eg, hypertension, hypercholesterolaemia, or hyperlipidemia). A Health Insurance Portability and Accountability Act of 1996-compliant programme was developed to allow providers to receive identification of individuals likely to have familial hypercholesterolaemia in their practice.FindingsUsing a model with a measured precision (positive predictive value) of 0·85, recall (sensitivity) of 0·45, area under the precision-recall curve of 0·55, and area under the receiver operating characteristic curve of 0·89, we flagged 1 331 759 of 170 416 201 patients in the national database and 866 of 173 733 individuals in the health-care delivery system dataset as likely to have familial hypercholesterolaemia. Familial hypercholesterolaemia experts reviewed a sample of flagged individuals (45 from the national database and 103 from the health-care delivery system dataset) and applied clinical familial hypercholesterolaemia diagnostic criteria. Of those reviewed, 87% (95% Cl 73-100) in the national database and 77% (68-86) in the health-care delivery system dataset were categorised as having a high enough clinical suspicion of familial hypercholesterolaemia to warrant guideline-based clinical evaluation and treatment.InterpretationThe FIND FH model successfully scans large, diverse, and disparate health-care encounter databases to identify individuals with familial hypercholesterolaemia.FundingThe FH Foundation funded this study. Support was received from Amgen, Sanofi, and Regeneron.

Project description:BackgroundChronic spinal pain conditions affect millions of US adults and carry a high healthcare cost burden, both direct and indirect. Conservative interventions for spinal pain conditions, including chiropractic care, have been associated with lower healthcare costs and improvements in pain status in different clinical populations, including veterans. Little is currently known about predicting healthcare service utilization in the domain of conservative interventions for spinal pain conditions, including the frequency of use of chiropractic services. The purpose of this retrospective cohort study was to explore the use of supervised machine learning approaches to predicting one-year chiropractic service utilization by veterans receiving VA chiropractic care.MethodsWe included 19,946 veterans who entered the Musculoskeletal Diagnosis Cohort between October 1, 2003 and September 30, 2013 and utilized VA chiropractic services within one year of cohort entry. The primary outcome was one-year chiropractic service utilization following index chiropractic visit, split into quartiles represented by the following classes: 1 visit, 2 to 3 visits, 4 to 6 visits, and 7 or greater visits. We compared the performance of four multiclass classification algorithms (gradient boosted classifier, stochastic gradient descent classifier, support vector classifier, and artificial neural network) in predicting visit quartile using 158 sociodemographic and clinical features.ResultsThe selected algorithms demonstrated poor prediction capabilities. Subset accuracy was 42.1% for the gradient boosted classifier, 38.6% for the stochastic gradient descent classifier, 41.4% for the support vector classifier, and 40.3% for the artificial neural network. The micro-averaged area under the precision-recall curve for each one-versus-rest classifier was 0.43 for the gradient boosted classifier, 0.38 for the stochastic gradient descent classifier, 0.43 for the support vector classifier, and 0.42 for the artificial neural network. Performance of each model yielded only a small positive shift in prediction probability (approximately 15%) compared to naïve classification.ConclusionsUsing supervised machine learning to predict chiropractic service utilization remains challenging, with only a small shift in predictive probability over naïve classification and limited clinical utility. Future work should examine mechanisms to improve model performance.

Project description:ObjectiveFamilial hypercholesterolaemia (FH) is a common inherited disorder causing premature coronary heart disease (CHD) and death. We have developed the novel Familial Hypercholesterolaemia Case Ascertainment Tool (FAMCAT 1) case-finding algorithm for application in primary care, to improve detection of FH. The performance of this algorithm was further improved by including personal history of premature CHD (FAMCAT 2 algorithm). This study has evaluated their performance, at 95% specificity, to detect genetically confirmed FH in the general population. We also compared these algorithms to established clinical case-finding criteria.MethodsProspective validation study, in 14 general practices, recruiting participants from the general adult population with cholesterol documented. For 260 participants with available health records, we determined possible FH cases based on FAMCAT thresholds, Dutch Lipid Clinic Network (DLCN) score, Simon-Broome criteria and recommended cholesterol thresholds (total cholesterol >9.0 mmol/L if ≥30 years or >7.5 mmol/L if <30 years), using clinical data from electronic and manual extraction of patient records and family history questionnaires. The reference standard was genetic testing. We examined detection rate (DR), sensitivity and specificity for each case-finding criteria.ResultsAt 95% specificity, FAMCAT 1 had a DR of 27.8% (95% CI 12.5% to 50.9%) with sensitivity of 31.2% (95% CI 11.0% to 58.7%); while FAMCAT 2 had a DR of 45.8% (95% CI 27.9% to 64.9%) with sensitivity of 68.8% (95% CI 41.3% to 89.0%). DLCN score ≥6 points yielded a DR of 35.3% (95% CI 17.3% to 58.7%) and sensitivity of 37.5% (95% CI 15.2% to 64.6%). Using recommended cholesterol thresholds resulted in DR of 28.0% (95% CI 14.3% to 47.6%) with sensitivity of 43.8% (95% CI 19.8% to 70.1%). Simon-Broome criteria had lower DR 11.3% (95% CI 6.0% to 20.0%) and specificity 70.9% (95% CI 64.8% to 76.5%) but higher sensitivity of 56.3% (95% CI 29.9% to 80.2%).ConclusionsIn primary care, in patients with cholesterol documented, FAMCAT 2 performs better than other case-finding criteria for detecting genetically confirmed FH, with no prior clinical review required for case finding.Trial registration numberNCT03934320.

Dataset Information

Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care.

Publications

Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets