Unknown

Dataset Information

0

Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records.


ABSTRACT: Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.

SUBMITTER: De Freitas JK 

PROVIDER: S-EPMC8441576 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records.

De Freitas Jessica K JK   Johnson Kipp W KW   Golden Eddye E   Nadkarni Girish N GN   Dudley Joel T JT   Bottinger Erwin P EP   Glicksberg Benjamin S BS   Miotto Riccardo R  

Patterns (New York, N.Y.) 20210902 9


Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its  ...[more]

Similar Datasets

| S-EPMC10239346 | biostudies-literature
| S-EPMC5904248 | biostudies-literature
| S-EPMC11469380 | biostudies-literature
| S-EPMC10112494 | biostudies-literature
| S-EPMC8137882 | biostudies-literature
| S-EPMC6620318 | biostudies-literature
| S-EPMC11478862 | biostudies-literature
| S-EPMC9516830 | biostudies-literature
| S-EPMC10727496 | biostudies-literature
| S-EPMC10801250 | biostudies-literature