Dataset Information

Development of a Portable Tool to Identify Patients With Atrial Fibrillation Using Clinical Notes From the Electronic Medical Record.

ABSTRACT:

Background

The electronic medical record contains a wealth of information buried in free text. We created a natural language processing algorithm to identify patients with atrial fibrillation (AF) using text alone.

Methods and results

We created 3 data sets from patients with at least one AF billing code from 2010 to 2017: a training set (n=886), an internal validation set from site no. 1 (n=285), and an external validation set from site no. 2 (n=276). A team of clinicians reviewed and adjudicated patients as AF present or absent, which served as the reference standard. We trained 54 algorithms to classify each patient, varying the model, number of features, number of stop words, and the method used to create the feature set. The algorithm with the highest F-score (the harmonic mean of sensitivity and positive predictive value) in the training set was applied to the validation sets. F-scores and area under the receiver operating characteristic curves were compared between site no. 1 and site no. 2 using bootstrapping. Adjudicated AF prevalence was 75.1% at site no. 1 and 86.2% at site no. 2. Among 54 algorithms, the best performing model was logistic regression, using 1000 features, 100 stop words, and term frequency-inverse document frequency method to create the feature set, with sensitivity 92.8%, specificity 93.9%, and an area under the receiver operating characteristic curve of 0.93 in the training set. The performance at site no. 1 was sensitivity 92.5%, specificity 88.7%, with an area under the receiver operating characteristic curve of 0.91. The performance at site no. 2 was sensitivity 89.5%, specificity 71.1%, with an area under the receiver operating characteristic curve of 0.80. The F-score was lower at site no. 2 compared with site no. 1 (92.5% [SD, 1.1%] versus 94.2% [SD, 1.1%]; P<0.001).

Conclusions

We developed a natural language processing algorithm to identify patients with AF using text alone, with >90% F-score at 2 separate sites. This approach allows better use of the clinical narrative and creates an opportunity for precise, high-throughput cohort identification.

SUBMITTER: Shah RU

PROVIDER: S-EPMC7646941 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Development of a Portable Tool to Identify Patients With Atrial Fibrillation Using Clinical Notes From the Electronic Medical Record.

Shah Rashmee U RU Mutharasan R Kannan RK Ahmad Faraz S FS Rosenblatt Anna G AG Gay Hawkins C HC Steinberg Benjamin A BA Yandell Mark M Tristani-Firouzi Martin M Klewer Jake J Mukherjee Rebeka R Lloyd-Jones Donald M DM

Circulation. Cardiovascular quality and outcomes 20201014 10

<h4>Background</h4>The electronic medical record contains a wealth of information buried in free text. We created a natural language processing algorithm to identify patients with atrial fibrillation (AF) using text alone.<h4>Methods and results</h4>We created 3 data sets from patients with at least one AF billing code from 2010 to 2017: a training set (n=886), an internal validation set from site no. 1 (n=285), and an external validation set from site no. 2 (n=276). A team of clinicians reviewe ...[more]

PMID: 33079591

Similar Datasets

Project description:BackgroundAtrial fibrillation (AF) burden on patients and healthcare systems warrants innovative strategies for screening asymptomatic individuals.ObjectiveWe sought to externally validate a predictive model originally developed in a German population to detect unidentified incident AF utilising real-world primary healthcare databases from countries in Europe and Australia.MethodsThis retrospective cohort study used anonymized, longitudinal patient data from 5 country-level primary care databases, including Australia, Belgium, France, Germany, and the UK. The study eligibility included adult patients (≥45 years) with either an AF diagnosis (cases) or no diagnosis (controls) who had continuous enrolment in the respective database prior to the study period. Logistic regression was fitted to a binary response (yes/no) for AF diagnosis using pre-determined risk factors.ResultsAF patients were from Germany (n = 63,562), the UK (n = 42,652), France (n = 7,213), Australia (n = 2,753), and Belgium (n = 1,371). Cases were more likely to have hypertension or other cardiac conditions than controls in all validation datasets compared to the model development data. The area under the receiver operating characteristic (ROC) curve in the validation datasets ranged from 0.79 (Belgium) to 0.84 (Germany), comparable to the German study model, which had an area under the curve of 0.83. Most validation sets reported similar specificity at approximately 80% sensitivity, ranging from 67% (France) to 71% (United Kingdom). The positive predictive value (PPV) ranged from 2% (Belgium) to 16% (Germany), and the number needed to be screened was 50 in Belgium and 6 in Germany. The prevalence of AF varied widely between these datasets, which may be related to different coding practices. Low prevalence affected PPV, but not sensitivity, specificity, and ROC curves.ConclusionsAF risk prediction algorithms offer targeted ways to identify patients using electronic health records, which could improve screening number and the cost-effectiveness of AF screening if implemented in clinical practice.

Dataset Information

Development of a Portable Tool to Identify Patients With Atrial Fibrillation Using Clinical Notes From the Electronic Medical Record.

Background

Methods and results

Conclusions

Publications

Development of a Portable Tool to Identify Patients With Atrial Fibrillation Using Clinical Notes From the Electronic Medical Record.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets