Dataset Information

Generative transfer learning for measuring plausibility of EHR diagnosis records.

ABSTRACT:

Objective

Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease.

Materials and methods

Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we develop generative classifiers on a small set of disease-agnostic features from EHRs that aim to represent Patients, pRoviders, and their Interactions within the healthcare SysteM (PRISM features).

Results

We demonstrate that PRISM features and the generative PRISM classifiers are potent for estimating disease probabilities and exhibit generalizable and transferable distributional characteristics across diseases and patient populations. The joint probabilities we learn about diseases through the PRISM features via PRISM generative models are transferable and generalizable to multiple diseases.

Discussion

The Generative Transfer Learning (GTL) approach with PRISM classifiers enables the scalable validation of computable phenotypes in EHRs without the need for domain-specific knowledge about specific disease processes.

Conclusion

Probabilities computed from the generative PRISM classifier can enhance and accelerate applied Machine Learning research and discoveries with EHR data.

SUBMITTER: Estiri H

PROVIDER: S-EPMC7936395 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Generative transfer learning for measuring plausibility of EHR diagnosis records.

Estiri Hossein H Vasey Sebastien S Murphy Shawn N SN

Journal of the American Medical Informatics Association : JAMIA 20210301 3

<h4>Objective</h4>Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease.<h4>Materials and methods</h4>Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we develop generative classifiers on a small set of disease-a ...[more]

PMID: 33043366

Similar Datasets

Project description:IntroductionEffective electronic health record (EHR)-based training interventions facilitate improved EHR use for healthcare providers. One such training intervention is simulation-based training that emphasises learning actual tasks through experimentation in a risk-free environment without negative patient outcomes. EHR-specific simulation-based training can be employed to improve EHR use, thereby enhancing healthcare providers' skills and behaviours. Despite the potential advantages of this type of training, no study has identified and mapped the available evidence. To fill that gap, this scoping review will synthesise the current state of literature on EHR simulation-based training.Methods and analysisThe Arksey and O'Malley methodological framework will be employed. Three databases (PubMed, Embase and Cumulative Index to Nursing and Allied Health Literature) will be searched for published articles. ProQuest and Google Scholar will be searched to identify unpublished articles. Databases will be searched from inception to 29 January 2020. Only articles written in English, randomised control trials, cohort studies, cross-sectional studies and case-control studies will be considered for inclusion. Two reviewers will independently screen titles and abstracts against inclusion and exclusion criteria. Then, they will review full texts to determine articles for final inclusion. Citation chaining will be conducted to manually screen references of all included studies to identify additional studies not found by the search. A data abstraction form with relevant characteristics will be developed to help address the research question. Descriptive numerical analysis will be used to describe characteristics of included studies. Based on the extracted data, research evidence of EHR simulation-based training will be synthesised.Ethics and disseminationSince no primary data will be collected, there will be no formal ethical review. Research findings will be disseminated through publications, presentations and meetings with relevant stakeholders.

Dataset Information

Generative transfer learning for measuring plausibility of EHR diagnosis records.

Objective

Materials and methods

Results

Discussion

Conclusion

Publications

Generative transfer learning for measuring plausibility of EHR diagnosis records.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets