Dataset Information

EHR-based phenotyping: Bulk learning and evaluation.

ABSTRACT: In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set.

SUBMITTER: Chiu PH

PROVIDER: S-EPMC5934756 | biostudies-literature | 2017 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

EHR-based phenotyping: Bulk learning and evaluation.

Chiu Po-Hsiang PH Hripcsak George G

Journal of biomedical informatics 20170412

In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with fe ...[more]

PMID: 28410982

Dataset Information

EHR-based phenotyping: Bulk learning and evaluation.

Publications

EHR-based phenotyping: Bulk learning and evaluation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping
| S-EPMC10653017 | biostudies-literature

V-Model: a new perspective for EHR-based phenotyping.
| S-EPMC4283133 | biostudies-literature

A Bayesian latent class approach for EHR-based phenotyping.
| S-EPMC6519239 | biostudies-literature

Evaluation of matched control algorithms in EHR-based phenotyping studies: a case study of inflammatory bowel disease comorbidities.
| S-EPMC4261034 | biostudies-literature

Design and validation of a FHIR-based EHR-driven phenotyping toolbox.
| S-EPMC9382394 | biostudies-literature

Designing An Individualized EHR Learning Plan For Providers.
| S-EPMC6220690 | biostudies-other

Evaluation and optimization of sequence-based gene regulatory deep learning models
2024-02-03 | GSE254493 | GEO

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.
| S-EPMC6290514 | biostudies-other

Image-based phenotyping of disaggregated cells using deep learning.
| S-EPMC7666170 | biostudies-literature

Evaluation of Machine Learning Interatomic Potentials for Gold Nanoparticles-Transferability towards Bulk.
| S-EPMC10303715 | biostudies-literature