Dataset Information

Concept libraries for automatic electronic health record based phenotyping: A review.

ABSTRACT:

Introduction

Electronic health records (EHR) are linked together to examine disease history and to undertake research into the causes and outcomes of disease. However, the process of constructing algorithms for phenotyping (e.g., identifying disease characteristics) or health characteristics (e.g., smoker) is very time consuming and resource costly. In addition, results can vary greatly between researchers. Reusing or building on algorithms that others have created is a compelling solution to these problems. However, sharing algorithms is not a common practice and many published studies do not detail the clinical code lists used by the researchers in the disease/characteristic definition. To address these challenges, a number of centres across the world have developed health data portals which contain concept libraries (e.g., algorithms for defining concepts such as disease and characteristics) in order to facilitate disease phenotyping and health studies.

Objectives

This study aims to review the literature of existing concept libraries, examine their utilities, identify the current gaps, and suggest future developments.

Methods

The five-stage framework of Arksey and O'Malley was used for the literature search. This approach included defining the research questions, identifying relevant studies through literature review, selecting eligible studies, charting and extracting data, and summarising and reporting the findings.

Results

This review identified seven publicly accessible Electronic Health data concept libraries which were developed in different countries including UK, USA, and Canada. The concept libraries (n = 7) investigated were either general libraries that hold phenotypes of multiple specialties (n = 4) or specialized libraries that manage only certain specialities such as rare diseases (n = 3). There were some clear differences between the general libraries such as archiving data from different electronic sources, and using a range of different types of coding systems. However, they share some clear similarities such as enabling users to upload their own code lists, and allowing users to use/download the publicly accessible code. In addition, there were some differences between the specialized libraries such as difference in ability to search, and if it was possible to use different searching queries such as simple or complex searches. Conversely, there were some similarities between the specialized libraries such as enabling users to upload their own concepts into the libraries and to show where they were published, which facilitates assessing the validity of the concepts. All the specialized libraries aimed to encourage the reuse of research methods such as lists of clinical code and/or metadata.

Conclusion

The seven libraries identified have been developed independently and appear to replicate similar concepts but in different ways. Collaboration between similar libraries would greatly facilitate the use of these libraries for the user. The process of building code lists takes time and effort. Access to existing code lists increases consistency and accuracy of definitions across studies. Concept library developers should collaborate with each other to raise awareness of their existence and of their various functions, which could increase users' contributions to those libraries and promote their wide-ranging adoption.

SUBMITTER: Almowil ZA

PROVIDER: S-EPMC8210840 | biostudies-literature | 2021 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Concept libraries for automatic electronic health record based phenotyping: A review.

Almowil Zahra A ZA Zhou Shang-Ming SM Brophy Sinead S

International journal of population data science 20210616 1

<h4>Introduction</h4>Electronic health records (EHR) are linked together to examine disease history and to undertake research into the causes and outcomes of disease. However, the process of constructing algorithms for phenotyping (e.g., identifying disease characteristics) or health characteristics (e.g., smoker) is very time consuming and resource costly. In addition, results can vary greatly between researchers. Reusing or building on algorithms that others have created is a compelling soluti ...[more]

PMID: 34189274

Similar Datasets

Project description:BackgroundElectronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research.ObjectiveThis review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions.MethodsA scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines.ResultsA total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule-based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance.ConclusionsRecognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.

Project description:ObjectiveElectronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physiological, and behavioral observations. They comprise a patient's clinical phenome, where each patient has thousands of date-stamped records distributed across many relational tables. Development of EHR computer-based phenotyping algorithms require time and medical insight from clinical experts, who most often can only review a small patient subset representative of the total EHR records, to identify phenotype features. In this research we evaluate whether relational machine learning (ML) using inductive logic programming (ILP) can contribute to addressing these issues as a viable approach for EHR-based phenotyping.MethodsTwo relational learning ILP approaches and three well-known WEKA (Waikato Environment for Knowledge Analysis) implementations of non-relational approaches (PART, J48, and JRIP) were used to develop models for nine phenotypes. International Classification of Diseases, Ninth Revision (ICD-9) coded EHR data were used to select training cohorts for the development of each phenotypic model. Accuracy, precision, recall, F-Measure, and Area Under the Receiver Operating Characteristic (AUROC) curve statistics were measured for each phenotypic model based on independent manually verified test cohorts. A two-sided binomial distribution test (sign test) compared the five ML approaches across phenotypes for statistical significance.ResultsWe developed an approach to automatically label training examples using ICD-9 diagnosis codes for the ML approaches being evaluated. Nine phenotypic models for each ML approach were evaluated, resulting in better overall model performance in AUROC using ILP when compared to PART (p=0.039), J48 (p=0.003) and JRIP (p=0.003).DiscussionILP has the potential to improve phenotyping by independently delivering clinically expert interpretable rules for phenotype definitions, or intuitive phenotypes to assist experts.ConclusionRelational learning using ILP offers a viable approach to EHR-driven phenotyping.

Project description:Introduction:Electronic health record (EHR)-driven phenotyping is a critical first step in generating biomedical knowledge from EHR data. Despite recent progress, current phenotyping approaches are manual, time-consuming, error-prone, and platform-specific. This results in duplication of effort and highly variable results across systems and institutions, and is not scalable or portable. In this work, we investigate how the nascent Clinical Quality Language (CQL) can address these issues and enable high-throughput, cross-platform phenotyping. Methods:We selected a clinically validated heart failure (HF) phenotype definition and translated it into CQL, then developed a CQL execution engine to integrate with the Observational Health Data Sciences and Informatics (OHDSI) platform. We executed the phenotype definition at two large academic medical centers, Northwestern Medicine and Weill Cornell Medicine, and conducted results verification (n = 100) to determine precision and recall. We additionally executed the same phenotype definition against two different data platforms, OHDSI and Fast Healthcare Interoperability Resources (FHIR), using the same underlying dataset and compared the results. Results:CQL is expressive enough to represent the HF phenotype definition, including Boolean and aggregate operators, and temporal relationships between data elements. The language design also enabled the implementation of a custom execution engine with relative ease, and results verification at both sites revealed that precision and recall were both 100%. Cross-platform execution resulted in identical patient cohorts generated by both data platforms. Conclusions:CQL supports the representation of arbitrarily complex phenotype definitions, and our execution engine implementation demonstrated cross-platform execution against two widely used clinical data platforms. The language thus has the potential to help address current limitations with portability in EHR-driven phenotyping and scale in learning health systems.

Project description:BACKGROUND:The transition to the electronic health record (EHR) has brought forth a rapid cultural shift in the world of medicine, presenting both new challenges as well as opportunities for improving health care. As clinicians work to adapt to the changes imposed by the EHR, identification of best practices around the clinically excellent use of the EHR is needed. OBJECTIVE:Using the domains of clinical excellence previously defined by the Johns Hopkins Miller Coulson Academy of Clinical Excellence, this review aims to identify best practices around the clinically excellent use of the EHR. METHODS:The authors searched the PubMed database, using keywords related to clinical excellence domains and the EHR, to capture the English-language, peer-reviewed literature published between January 1, 2000, and August 2, 2016. One author independently reviewed each article and extracted relevant data. RESULTS:The search identified 606 titles, with the majority (393/606, 64.9%) in the domain of communication and interpersonal skills. Twenty-eight of the 606 (4.6%) titles were excluded from full-text review, primarily due to lack of availability of the full-text article. The remaining 578 full-text articles reviewed were related to clinical excellence generally (3/578, 0.5%) or the specific domains of communication and interpersonal skills (380/578, 65.7%), diagnostic acumen (31/578, 5.4%), skillful negotiation of the health care system (4/578, 0.7%), scholarly approach to clinical practice (41/578, 7.1%), professionalism and humanism (2/578, 0.4%), knowledge (97/578, 16.8%), and passion for clinical medicine (20/578, 3.5%). CONCLUSIONS:Results suggest that as familiarity and expertise are developed, clinicians are leveraging the EHR to provide clinically excellent care. Best practices identified included deliberate physical configuration of the clinical space to involve sharing the screen with patients and limiting EHR use during difficult and emotional topics. Promising horizons for the EHR include the ability to augment participation in pragmatic trials, identify adverse drug effects, correlate genomic data to clinical outcomes, and follow data-driven guidelines. Clinician and patient satisfaction with the EHR has generally improved with time, and hopefully continued clinician, and patient input will lead to a system that satisfies all.

Dataset Information

Concept libraries for automatic electronic health record based phenotyping: A review.

Introduction

Objectives

Methods

Results

Conclusion

Publications

Concept libraries for automatic electronic health record based phenotyping: A review.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets