Dataset Information

Detecting Miscoded Diabetes Diagnosis Codes in Electronic Health Records for Quality Improvement: Temporal Deep Learning Approach.

ABSTRACT:

Background

Diabetes affects more than 30 million patients across the United States. With such a large disease burden, even a small error in classification can be significant. Currently billing codes, assigned at the time of a medical encounter, are the "gold standard" reflecting the actual diseases present in an individual, and thus in aggregate reflect disease prevalence in the population. These codes are generated by highly trained coders and by health care providers but are not always accurate.

Objective

This work provides a scalable deep learning methodology to more accurately classify individuals with diabetes across multiple health care systems.

Methods

We leveraged a long short-term memory-dense neural network (LSTM-DNN) model to identify patients with or without diabetes using data from 5 acute care facilities with 187,187 patients and 275,407 encounters, incorporating data elements including laboratory test results, diagnostic/procedure codes, medications, demographic data, and admission information. Furthermore, a blinded physician panel reviewed discordant cases, providing an estimate of the total impact on the population.

Results

When predicting the documented diagnosis of diabetes, our model achieved an 84% F1 score, 96% area under the curve-receiver operating characteristic curve, and 91% average precision on a heterogeneous data set from 5 distinct health facilities. However, in 81% of cases where the model disagreed with the documented phenotype, a blinded physician panel agreed with the model. Taken together, this suggests that 4.3% of our studied population have either missing or improper diabetes diagnosis.

Conclusions

This study demonstrates that deep learning methods can improve clinical phenotyping even when patient data are noisy, sparse, and heterogeneous.

SUBMITTER: Rashidian S

PROVIDER: S-EPMC7775195 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Detecting Miscoded Diabetes Diagnosis Codes in Electronic Health Records for Quality Improvement: Temporal Deep Learning Approach.

Rashidian Sina S Abell-Hart Kayley K Hajagos Janos J Moffitt Richard R Lingam Veena V Garcia Victor V Tsai Chao-Wei CW Wang Fusheng F Dong Xinyu X Sun Siao S Deng Jianyuan J Gupta Rajarsi R Miller Joshua J Saltz Joel J Saltz Mary M

JMIR medical informatics 20201217 12

<h4>Background</h4>Diabetes affects more than 30 million patients across the United States. With such a large disease burden, even a small error in classification can be significant. Currently billing codes, assigned at the time of a medical encounter, are the "gold standard" reflecting the actual diseases present in an individual, and thus in aggregate reflect disease prevalence in the population. These codes are generated by highly trained coders and by health care providers but are not always ...[more]

PMID: 33331828

Similar Datasets

Project description:ImportancePatients can develop multiple skin cancers, and their medical data can be spread over multiple health care systems. This fragmented care, combined with the lack of skin cancer registries, has limited our ability both to provide accurate estimates of incidence and to study the pathogenesis of multiple skin cancers.ObjectiveTo assess whether standard diagnostic and procedural codes present in the electronic health records at a single health care system are a valid proxy for estimating the number of overall skin cancers.Design, setting, and participantsRetrospective cohort study of patients seen at a single-center tertiary care hospital (ie, Vanderbilt University Medical Center) between July 1, 2008, and June 30, 2018. All patients with at least 1 electronic health record-based diagnostic or procedural code for any skin cancer and at least 1 pathology report of a skin cancer.ExposureThe number of International Classification of Disease (ICD) or Current Procedural Terminology (CPT) codes relating to skin cancer.Main outcomes and measuresPearson correlation coefficient and R2 were calculated for the total number of ICD or CPT codes for skin cancer and histologically verified skin cancers.ResultsIn this cohort study of 35 901 patients, the mean (SD) age was 70.4 (14.4) years, 20 404 (56.8%) were men, and 31 623 (88.1%) were White individuals. Of these patients, 6307 had at least 1 ICD or CPT code or pathology report for a skin cancer, of whom 5688 patients had both a CPT code related to skin malignancy and a histologically verified skin cancer. There was a strong linear correlation between the number of CPT codes and pathology records (r = 0.87). There was a poor correlation between the number of ICD codes and pathology records (r = 0.22).Conclusions and relevanceThis cohort study found that the use of ICD codes was a poor proxy measure for the number of skin cancers per patient. In contrast, CPT codes accounted for more than 75% of the variability in the number of skin cancers (R2 = 0.76) and were a better proxy measure for the total number of skin cancers per patient.

Project description:Background Statins are guideline-recommended medications that reduce cardiovascular events in patients with diabetes. Yet, statin use is concerningly low in this high-risk population. Identifying reasons for statin nonuse, which are typically described in unstructured electronic health record data, can inform targeted system interventions to improve statin use. We aimed to leverage a deep learning approach to identify reasons for statin nonuse in patients with diabetes. Methods and Results Adults with diabetes and no statin prescriptions were identified from a multiethnic, multisite Northern California electronic health record cohort from 2014 to 2020. We used a benchmark deep learning natural language processing approach (Clinical Bidirectional Encoder Representations from Transformers) to identify statin nonuse and reasons for statin nonuse from unstructured electronic health record data. Performance was evaluated against expert clinician review from manual annotation of clinical notes and compared with other natural language processing approaches. Of 33 461 patients with diabetes (mean age 59±15 years, 49% women, 36% White patients, 24% Asian patients, and 15% Hispanic patients), 47% (15 580) had no statin prescriptions. From unstructured data, Clinical Bidirectional Encoder Representations from Transformers accurately identified statin nonuse (area under receiver operating characteristic curve [AUC] 0.99 [0.98-1.0]) and key patient (eg, side effects/contraindications), clinician (eg, guideline-discordant practice), and system reasons (eg, clinical inertia) for statin nonuse (AUC 0.90 [0.86-0.93]) and outperformed other natural language processing approaches. Reasons for nonuse varied by clinical and demographic characteristics, including race and ethnicity. Conclusions A deep learning algorithm identified statin nonuse and actionable reasons for statin nonuse in patients with diabetes. Findings may enable targeted interventions to improve guideline-directed statin use and be scaled to other evidence-based therapies.

Dataset Information

Detecting Miscoded Diabetes Diagnosis Codes in Electronic Health Records for Quality Improvement: Temporal Deep Learning Approach.

Background

Objective

Methods

Results

Conclusions

Publications

Detecting Miscoded Diabetes Diagnosis Codes in Electronic Health Records for Quality Improvement: Temporal Deep Learning Approach.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets