Dataset Information

Enabling phenotypic big data with PheNorm.

ABSTRACT: Objective:Electronic health record (EHR)-based phenotyping infers whether a patient has a disease based on the information in his or her EHR. A human-annotated training set with gold-standard disease status labels is usually required to build an algorithm for phenotyping based on a set of predictive features. The time intensiveness of annotation and feature curation severely limits the ability to achieve high-throughput phenotyping. While previous studies have successfully automated feature curation, annotation remains a major bottleneck. In this paper, we present PheNorm, a phenotyping algorithm that does not require expert-labeled samples for training. Methods:The most predictive features, such as the number of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes or mentions of the target phenotype, are normalized to resemble a normal mixture distribution with high area under the receiver operating curve (AUC) for prediction. The transformed features are then denoised and combined into a score for accurate disease classification. Results:We validated the accuracy of PheNorm with 4 phenotypes: coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis. The AUCs of the PheNorm score reached 0.90, 0.94, 0.95, and 0.94 for the 4 phenotypes, respectively, which were comparable to the accuracy of supervised algorithms trained with sample sizes of 100-300, with no statistically significant difference. Conclusion:The accuracy of the PheNorm algorithms is on par with algorithms trained with annotated samples. PheNorm fully automates the generation of accurate phenotyping algorithms and demonstrates the capacity for EHR-driven annotations to scale to the next level - phenotypic big data.

SUBMITTER: Yu S

PROVIDER: S-EPMC6251688 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Enabling phenotypic big data with PheNorm.

Yu Sheng S Ma Yumeng Y Gronsbell Jessica J Cai Tianrun T Ananthakrishnan Ashwin N AN Gainer Vivian S VS Churchill Susanne E SE Szolovits Peter P Murphy Shawn N SN Kohane Isaac S IS Liao Katherine P KP Cai Tianxi T

Journal of the American Medical Informatics Association : JAMIA 20180101 1

<h4>Objective</h4>Electronic health record (EHR)-based phenotyping infers whether a patient has a disease based on the information in his or her EHR. A human-annotated training set with gold-standard disease status labels is usually required to build an algorithm for phenotyping based on a set of predictive features. The time intensiveness of annotation and feature curation severely limits the ability to achieve high-throughput phenotyping. While previous studies have successfully automated feat ...[more]

PMID: 29126253

Similar Datasets

Project description:ObjectiveThe clinical features of epilepsy determine how it is defined, which in turn guides management. Therefore, consideration of the fundamental clinical entities that comprise an epilepsy is essential in the study of causes, trajectories, and treatment responses. The Human Phenotype Ontology (HPO) is used widely in clinical and research genetics for concise communication and modeling of clinical features, allowing extracted data to be harmonized using logical inference. We sought to redesign the HPO seizure subontology to improve its consistency with current epileptological concepts, supporting the use of large clinical data sets in high-throughput clinical and research genomics.MethodsWe created a new HPO seizure subontology based on the 2017 International League Against Epilepsy (ILAE) Operational Classification of Seizure Types, and integrated concepts of status epilepticus, febrile, reflex, and neonatal seizures at different levels of detail. We compared the HPO seizure subontology prior to, and following, our revision, according to the information that could be inferred about the seizures of 791 individuals from three independent cohorts: 2 previously published and 150 newly recruited individuals. Each cohort's data were provided in a different format and harmonized using the two versions of the HPO.ResultsThe new seizure subontology increased the number of descriptive concepts for seizures 5-fold. The number of seizure descriptors that could be annotated to the cohort increased by 40% and the total amount of information about individuals' seizures increased by 38%. The most important qualitative difference was the relationship of focal to bilateral tonic-clonic seizure to generalized-onset and focal-onset seizures.SignificanceWe have generated a detailed contemporary conceptual map for harmonization of clinical seizure data, implemented in the official 2020-12-07 HPO release and freely available at hpo.jax.org. This will help to overcome the phenotypic bottleneck in genomics, facilitate reuse of valuable data, and ultimately improve diagnostics and precision treatment of the epilepsies.

Dataset Information

Enabling phenotypic big data with PheNorm.

Publications

Enabling phenotypic big data with PheNorm.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets