Unknown

Dataset Information

0

Harmonization and semantic annotation of data dictionaries from the Pharmacogenomics Research Network: a case study.


ABSTRACT: The Pharmacogenomics Research Network (PGRN) is a collaborative partnership of research groups funded by NIH to discover and understand how genome contributes to an individual's response to medication. Since traditional biomedical research studies and clinical trials are often conducted independently, common and standardized representations for data are seldom used. This leads to heterogeneity in data representation, which hinders data reuse, data integration and meta-analyses. This study demonstrates harmonization and semantic annotation work for pharmacogenomics data dictionaries collected from PGRN research groups. A semi-automated system was developed to support the harmonization/annotation process, which includes four individual steps, (1) pre-processing PGRN variables; (2) decomposing and normalizing variable descriptions; (3) semantically annotating words and phrases using controlled terminologies; (4) grouping PGRN variables into categories based on the annotation results and semantic types, for total 1514 PGRN variables. Our results demonstrate that there is a significant amount of variability in how pharmacogenomics data is represented and that additional standardization efforts are needed. This represents a critical first step toward identifying and creating data standards for pharmacogenomics studies.

SUBMITTER: Zhu Q 

PROVIDER: S-EPMC3606279 | biostudies-literature | 2013 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Harmonization and semantic annotation of data dictionaries from the Pharmacogenomics Research Network: a case study.

Zhu Qian Q   Freimuth Robert R RR   Lian Zonghui Z   Bauer Scott S   Pathak Jyotishman J   Tao Cui C   Durski Matthew J MJ   Chute Christopher G CG  

Journal of biomedical informatics 20121129 2


The Pharmacogenomics Research Network (PGRN) is a collaborative partnership of research groups funded by NIH to discover and understand how genome contributes to an individual's response to medication. Since traditional biomedical research studies and clinical trials are often conducted independently, common and standardized representations for data are seldom used. This leads to heterogeneity in data representation, which hinders data reuse, data integration and meta-analyses. This study demons  ...[more]

Similar Datasets

| S-EPMC3817185 | biostudies-literature
| S-EPMC6136943 | biostudies-literature
| S-EPMC4772674 | biostudies-literature
| S-EPMC5407152 | biostudies-literature
| S-EPMC3714548 | biostudies-literature
| S-EPMC7829634 | biostudies-literature
| S-EPMC4675662 | biostudies-literature
| S-EPMC6977333 | biostudies-literature
| S-EPMC1618408 | biostudies-literature
| S-EPMC9176854 | biostudies-literature