Unknown

Dataset Information

0

Phenotype Information Retrieval for Existing GWAS Studies.


ABSTRACT: The database of Genotypes and Phenotypes (dbGaP) is archiving the results of different Genome Wide Association Studies (GWAS). dbGaP has a multitude of phenotype variables, but they are not harmonized across studies. We proposed a method to standardize phenotype variables by classifying similar variables based on semantic distances. We first extracted variables description, enriched them using domain knowledge, and computed the distances among them. We used clustering techniques to classify the most similar variables. We used domain experts to audit clusters, annotated the clusters with appropriate labels, and used re-clustering to build a semantically-driven Genotypes and Phenotypes (sdGaP) ontology using the UMLS semantic network and metathesaurus. The sdGaP ontology allowed us to expand user queries and retrieve information using a semantic metric called density measure (DM). We illustrated the potential improvement of information retrieval using the sdGaP ontology in one search scenario using the variables from the Cleveland Family Study.

SUBMITTER: Alipanah N 

PROVIDER: S-EPMC3845737 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Phenotype Information Retrieval for Existing GWAS Studies.

Alipanah Neda N   Lin Ko-Wei KW   Venkatesh Vinay V   Farzaneh Seena S   Kim Hyeon-Eui HE  

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science 20130318


The database of Genotypes and Phenotypes (dbGaP) is archiving the results of different Genome Wide Association Studies (GWAS). dbGaP has a multitude of phenotype variables, but they are not harmonized across studies. We proposed a method to standardize phenotype variables by classifying similar variables based on semantic distances. We first extracted variables description, enriched them using domain knowledge, and computed the distances among them. We used clustering techniques to classify the  ...[more]

Similar Datasets

| S-EPMC3865418 | biostudies-literature
| S-EPMC2832822 | biostudies-other
| S-EPMC5402950 | biostudies-literature
| S-EPMC2703930 | biostudies-literature
| S-EPMC10659118 | biostudies-literature
| S-EPMC5643841 | biostudies-literature
| S-EPMC7722071 | biostudies-literature
| S-EPMC1380203 | biostudies-literature
| S-EPMC1380199 | biostudies-other
| S-EPMC2729382 | biostudies-literature