Dataset Information

Extracting information from the text of electronic medical records to improve case detection: a systematic review.

ABSTRACT:

Background

Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality.

Methods

A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed.

Results

Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025).

Conclusions

Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall).

SUBMITTER: Ford E

PROVIDER: S-EPMC4997034 | biostudies-literature | 2016 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Extracting information from the text of electronic medical records to improve case detection: a systematic review.

Ford Elizabeth E Carroll John A JA Smith Helen E HE Scott Donia D Cassell Jackie A JA

Journal of the American Medical Informatics Association : JAMIA 20160205 5

<h4>Background</h4>Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from te ...[more]

PMID: 26911811

Similar Datasets

Project description:Background and objectivePercutaneous coronary intervention (PCI) using drug-eluting stents (DES) is an indispensable treatment for coronary artery disease. However, to evaluate the performance of various types of stents for PCI, numerous resources are required. We extracted clinical information from free-text records and, using practice-based evidence, compared the efficacy of various DES.Materials and methodsWe developed a text mining tool based on regular expression and applied it to PCI reports stored in the electronic health records (EHRs) of Ajou University Hospital from 2010-2014. The PCI data were extracted from EHRs with a sensitivity of 0.996, a specificity of 1.000, and an F-measure of 0.995 when compared with a sample of 200 reports. Using these data, we compared the performance of stents by Kaplan-Meier analysis and the Cox hazard proportional regression.ResultsIn the self-validation analysis comparing the first-generation to the second-generation DES, the second-generation DES was superior to the first-generation DES (hazard ratio [HR]: 0.423, 95% confidence interval [CI]: 0.284-0.630) in terms of target vessel revascularization (TVR), showing similar findings to the established results of previous studies. Among the second-generation DES, the biodegradable-polymer DES tended to be superior, with a risk of TVR (HR: 0.568, 95% CI: 0.281-1.147) falling below than that for the durable-polymer DES approximately 1 year after the index procedure. The Endeavor stent had the highest TVR risk among the newer generation DES (HR: 2.576, 95% CI: 1.273-5.210).ConclusionsIn this study, we demonstrated how to construct a PCI data warehouse of PCI-related parameters obtained from free-text electronic records with high accuracy for use in the post surveillance of coronary stents in a time- and cost effective manner. Post surveillance of the practice based evidence in the PCI data warehouse indicated that the biodegradable-polymer DES might have a lower risk of TVR than the durable-polymer DES.

Project description:BackgroundAs the availability of interoperable electronic health records (iEHRs) or health information exchanges (HIEs) continues to increase, there is greater need and opportunity to assess the current evidence base on what works and what does not regarding the adoption, use, and impact of iEHRs.ObjectiveThe purpose of this project is to assess the international evidence base on the adoption, use, and impact of iEHRs.MethodsWe conducted a systematic review, searching multiple databases-MEDLINE, Embase, and the Cumulative Index to Nursing and Allied Health Literature (CINAHL)-with supplemental searches conducted in Google Scholar and grey literature sources (ie, Google, Grey Literature Report, and OpenGrey). All searches were conducted in January and February 2017. Articles were eligible for inclusion if they were published in English, were published from 2006 to 2017, and were either an original research study or a literature review. In order to be included, articles needed to focus on iEHRs and HIEs across multiple health care settings, as well as on the impact and effectiveness of iEHR adoption and use.ResultsWe included 130 articles in the synthesis (113 primary studies, 86.9%; 17 reviews, 13.1%), with the majority focused on the United States (88/130, 67.7%). The primary studies focused on a wide range of health care settings; the three most prevalent settings studied included acute care (59/113, 52.2%), primary care (44/113, 38.9%), and emergency departments (34/113, 30.1%). We identified 29 distinct measurement items in the 113 primary studies that were linked to 522 specific measurement outcomes. Productivity and quality were the two evaluation dimensions that received the most attention, accounting for 14 of 29 (48%) measurement items and 306 of 522 (58.6%) measurement outcomes identified. Overall, the majority of the 522 measurement outcomes were positive (298/522, 57.1%). We also identified 17 reviews on iEHR use and impact, 6 (35%) that focused on barriers and facilitators to adoption and implementation and 11 (65%) that focused on benefits and impacts, with the more recent reviews finding little generalizable evidence of benefit and impact.ConclusionsThis review captures the status of an evolving and active field focused on the use and impact of iEHRs. While the overall findings suggest many positive impacts, the quality of the primary studies were not evaluated systematically. When broken down by specific measurement item, the results directed attention both to measurement outcomes that were consistently positive and others that were mostly negative or equivocal.

Dataset Information

Extracting information from the text of electronic medical records to improve case detection: a systematic review.

Background

Methods

Results

Conclusions

Publications

Extracting information from the text of electronic medical records to improve case detection: a systematic review.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets