Dataset Information

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation.

ABSTRACT: BACKGROUND:Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE:In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text-Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS:CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS:Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS:The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.

SUBMITTER: Liu S

PROVIDER: S-EPMC7576539 | biostudies-literature | 2020 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation.

Liu Sijia S Wang Yanshan Y Wen Andrew A Wang Liwei L Hong Na N Shen Feichen F Bedrick Steven S Hersh William W Liu Hongfang H

JMIR medical informatics 20201006 10

<h4>Background</h4>Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records.<h ...[more]

PMID: 33021486

Similar Datasets

Project description:BackgroundElectronic health records (EHRs, such as those created by an anesthesia management system) generate a large amount of data that can notably be reused for clinical audits and scientific research. The sharing of these data and tools is generally affected by the lack of system interoperability. To overcome these issues, Observational Health Data Sciences and Informatics (OHDSI) developed the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to standardize EHR data and promote large-scale observational and longitudinal research. Anesthesia data have not previously been mapped into the OMOP CDM.ObjectiveThe primary objective was to transform anesthesia data into the OMOP CDM. The secondary objective was to provide vocabularies, queries, and dashboards that might promote the exploitation and sharing of anesthesia data through the CDM.MethodsUsing our local anesthesia data warehouse, a group of 5 experts from 5 different medical centers identified local concepts related to anesthesia. The concepts were then matched with standard concepts in the OHDSI vocabularies. We performed structural mapping between the design of our local anesthesia data warehouse and the OMOP CDM tables and fields. To validate the implementation of anesthesia data into the OMOP CDM, we developed a set of queries and dashboards.ResultsWe identified 522 concepts related to anesthesia care. They were classified as demographics, units, measurements, operating room steps, drugs, periods of interest, and features. After semantic mapping, 353 (67.7%) of these anesthesia concepts were mapped to OHDSI concepts. Further, 169 (32.3%) concepts related to periods and features were added to the OHDSI vocabularies. Then, 8 OMOP CDM tables were implemented with anesthesia data and 2 new tables (EPISODE and FEATURE) were added to store secondarily computed data. We integrated data from 5,72,609 operations and provided the code for a set of 8 queries and 4 dashboards related to anesthesia care.ConclusionsGeneric data concerning demographics, drugs, units, measurements, and operating room steps were already available in OHDSI vocabularies. However, most of the intraoperative concepts (the duration of specific steps, an episode of hypotension, etc) were not present in OHDSI vocabularies. The OMOP mapping provided here enables anesthesia data reuse.

Project description:BackgroundPatient-monitoring software generates a large amount of data that can be reused for clinical audits and scientific research. The Observational Health Data Sciences and Informatics (OHDSI) consortium developed the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize electronic health record data and promote large-scale observational and longitudinal research.ObjectiveThis study aimed to transform primary care data into the OMOP CDM format.MethodsWe extracted primary care data from electronic health records at a multidisciplinary health center in Wattrelos, France. We performed structural mapping between the design of our local primary care database and the OMOP CDM tables and fields. Local French vocabularies concepts were mapped to OHDSI standard vocabularies. To validate the implementation of primary care data into the OMOP CDM format, we applied a set of queries. A practical application was achieved through the development of a dashboard.ResultsData from 18,395 patients were implemented into the OMOP CDM, corresponding to 592,226 consultations over a period of 20 years. A total of 18 OMOP CDM tables were implemented. A total of 17 local vocabularies were identified as being related to primary care and corresponded to patient characteristics (sex, location, year of birth, and race), units of measurement, biometric measures, laboratory test results, medical histories, and drug prescriptions. During semantic mapping, 10,221 primary care concepts were mapped to standard OHDSI concepts. Five queries were used to validate the OMOP CDM by comparing the results obtained after the completion of the transformations with the results obtained in the source software. Lastly, a prototype dashboard was developed to visualize the activity of the health center, the laboratory test results, and the drug prescription data.ConclusionsPrimary care data from a French health care facility have been implemented into the OMOP CDM format. Data concerning demographics, units, measurements, and primary care consultation steps were already available in OHDSI vocabularies. Laboratory test results and drug prescription data were mapped to available vocabularies and structured in the final model. A dashboard application provided health care professionals with feedback on their practice.

Project description:BackgroundMany patients with atrial fibrillation (AF) remain undiagnosed despite availability of interventions to reduce stroke risk. Predictive models to date are limited by data requirements and theoretical usage. We aimed to develop a model for predicting the 2-year probability of AF diagnosis and implement it as proof-of-concept (POC) in a production electronic health record (EHR).MethodsWe used a nested case-control design using data from the Indiana Network for Patient Care. The development cohort came from 2016 to 2017 (outcome period) and 2014 to 2015 (baseline). A separate validation cohort used outcome and baseline periods shifted 2 years before respective development cohort times. Machine learning approaches were used to build predictive model. Patients ≥ 18 years, later restricted to age ≥ 40 years, with at least two encounters and no AF during baseline, were included. In the 6-week EHR prospective pilot, the model was silently implemented in the production system at a large safety-net urban hospital. Three new and two previous logistic regression models were evaluated using receiver-operating characteristics. Number, characteristics, and CHA2DS2-VASc scores of patients identified by the model in the pilot are presented.ResultsAfter restricting age to ≥ 40 years, 31,474 AF cases (mean age, 71.5 years; female 49%) and 22,078 controls (mean age, 59.5 years; female 61%) comprised the development cohort. A 10-variable model using age, acute heart disease, albumin, body mass index, chronic obstructive pulmonary disease, gender, heart failure, insurance, kidney disease, and shock yielded the best performance (C-statistic, 0.80 [95% CI 0.79-0.80]). The model performed well in the validation cohort (C-statistic, 0.81 [95% CI 0.8-0.81]). In the EHR pilot, 7916/22,272 (35.5%; mean age, 66 years; female 50%) were identified as higher risk for AF; 5582 (70%) had CHA2DS2-VASc score ≥ 2.ConclusionsUsing variables commonly available in the EHR, we created a predictive model to identify 2-year risk of developing AF in those previously without diagnosed AF. Successful POC implementation of the model in an EHR provided a practical strategy to identify patients who may benefit from interventions to reduce their stroke risk.

Project description:BackgroundThe Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) has been implemented on various claims and electronic health record (EHR) databases, but has not been applied to a hospital transactional database. This study addresses the implementation of the OMOP CDM on the U.S. Premier Hospital database.MethodsWe designed and implemented an extract, transform, load (ETL) process to convert the Premier hospital database into the OMOP CDM. Standard charge codes in Premier were mapped between the OMOP version 4.0 Vocabulary and standard charge descriptions. Visit logic was added to impute the visit dates. We tested the conversion by replicating a published study using the raw and transformed databases. The Premier hospital database was compared to a claims database, in regard to prevalence of disease.FindingsThe data transformed into the CDM resulted in 1% of the data being discarded due to data errors in the raw data. A total of 91.4% of Premier standard charge codes were mapped successfully to a standard vocabulary. The results of the replication study resulted in a similar distribution of patient characteristics. The comparison to the claims data yields notable similarities and differences amongst conditions represented in both databases.DiscussionThe transformation of the Premier database into the OMOP CDM version 4.0 adds value in conducting analyses due to successful mapping of the drugs and procedures. The addition of visit logic gives ordinality to drugs and procedures that wasn't present prior to the transformation. Comparing conditions in Premier against a claims database can provide an understanding about Premier's potential use in pharmacoepidemiology studies that are traditionally conducted via claims databases.Conclusion and next stepsThe conversion of the Premier database into the OMOP CDM 4.0 was completed successfully. The next steps include refinement of vocabularies and mappings and continual maintenance of the transformed CDM.

Project description:BackgroundCancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date.ObjectiveWe aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports.MethodsThyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data.ResultsThe extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%.ConclusionAs a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies.

Project description:BackgroundNational classifications and terminologies already routinely used for documentation within patient care settings enable the unambiguous representation of clinical information. However, the diversity of different vocabularies across health care institutions and countries is a barrier to achieving semantic interoperability and exchanging data across sites. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) enables the standardization of structure and medical terminology. It allows the mapping of national vocabularies into so-called standard concepts, representing normative expressions for international analyses and research. Within our project "Hybrid Quality Indicators Using Machine Learning Methods" (Hybrid-QI), we aim to harmonize source codes used in German claims data vocabularies that are currently unavailable in the OMOP CDM.ObjectiveThis study aims to increase the coverage of German vocabularies in the OMOP CDM. We aim to completely transform the source codes used in German claims data into the OMOP CDM without data loss and make German claims data usable for OMOP CDM-based research.MethodsTo prepare the missing German vocabularies for the OMOP CDM, we defined a vocabulary preparation approach consisting of the identification of all codes of the corresponding vocabularies, their assembly into machine-readable tables, and the translation of German designations into English. Furthermore, we used 2 proposed approaches for OMOP-compliant vocabulary preparation: the mapping to standard concepts using the Observational Health Data Sciences and Informatics (OHDSI) tool Usagi and the preparation of new 2-billion concepts (ie, concept_id >2 billion). Finally, we evaluated the prepared vocabularies regarding completeness and correctness using synthetic German claims data and calculated the coverage of German claims data vocabularies in the OMOP CDM.ResultsOur vocabulary preparation approach was able to map 3 missing German vocabularies to standard concepts and prepare 8 vocabularies as new 2-billion concepts. The completeness evaluation showed that the prepared vocabularies cover 44.3% (3288/7417) of the source codes contained in German claims data. The correctness evaluation revealed that the specified validity periods in the OMOP CDM are compliant for the majority (705,531/706,032, 99.9%) of source codes and associated dates in German claims data. The calculation of the vocabulary coverage showed a noticeable decrease of missing vocabularies from 55% (11/20) to 10% (2/20) due to our preparation approach.ConclusionsBy preparing 10 vocabularies, we showed that our approach is applicable to any type of vocabulary used in a source data set. The prepared vocabularies are currently limited to German vocabularies, which can only be used in national OMOP CDM research projects, because the mapping of new 2-billion concepts to standard concepts is missing. To participate in international OHDSI network studies with German claims data, future work is required to map the prepared 2-billion concepts to standard concepts.

Dataset Information

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation.

Publications

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets