Dataset Information

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.

ABSTRACT: BACKGROUND:There has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide more detail about condition and treatment of patients. METHODS:In this work, we propose a method that jointly learns vector representations of medical concepts and words. This is achieved by a novel learning scheme based on the word2vec model. Our model learns those relationships by integrating clinical notes and sets of accompanying medical codes and by defining joint contexts for each observed word and medical code. RESULTS:In our experiments, we learned joint representations using MIMIC-III data. Using the learned representations of words and medical codes, we evaluated phenotypes for 6 diseases discovered by our and baseline method. The experimental results show that for each of the 6 diseases our method finds highly relevant words. We also show that our representations can be very useful when predicting the reason for the next visit. CONCLUSIONS:The jointly learned representations of medical concepts and words capture not only similarity between codes or words themselves, but also similarity between codes and words. They can be used to extract phenotypes of different diseases. The representations learned by the joint model are also useful for construction of patient features.

SUBMITTER: Bai T

PROVIDER: S-EPMC6290514 | biostudies-other | 2018 Dec

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.

Bai Tian T Chanda Ashis Kumar AK Egleston Brian L BL Vucetic Slobodan S

BMC medical informatics and decision making 20181212 Suppl 4

<h4>Background</h4>There has been an increasing interest in learning low-dimensional vector representations of medical concepts from Electronic Health Records (EHRs). Vector representations of medical concepts facilitate exploratory analysis and predictive modeling of EHR data to gain insights about the patterns of care and health outcomes. EHRs contain structured data such as diagnostic codes and laboratory tests, as well as unstructured free text data in form of clinical notes, which provide m ...[more]

PMID: 30537974

Dataset Information

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.

Publications

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Federated learning of medical concepts embedding using BEHRT.
| S-EPMC11498200 | biostudies-literature

Distributed learning from multiple EHR databases: Contextual embedding models for medical events.
| S-EPMC6533615 | biostudies-literature

Comparative effectiveness of medical concept embedding for feature engineering in phenotyping.
| S-EPMC8206403 | biostudies-literature

Deep hierarchical embedding for simultaneous modeling of GPCR proteins in a unified metric space.
| S-EPMC8100104 | biostudies-literature

Jointly Embedding Multiple Single-Cell Omics Measurements.
| S-EPMC8496402 | biostudies-literature

Replacing non-biomedical concepts improves embedding of biomedical concepts.
| S-EPMC11244985 | biostudies-literature

Appraisal of space words and allocation of emotion words in bodily space.
| S-EPMC3859505 | biostudies-literature

EHR-based phenotyping: Bulk learning and evaluation.
| S-EPMC5934756 | biostudies-literature

V-Model: a new perspective for EHR-based phenotyping.
| S-EPMC4283133 | biostudies-literature

A Bayesian latent class approach for EHR-based phenotyping.
| S-EPMC6519239 | biostudies-literature