Unknown

Dataset Information

0

Evaluating and sharing global genetic ancestry in biomedical datasets.


ABSTRACT: Genetic ancestry is a critical co-factor to study phenotype-genotype associations using cohorts of human subjects. Most publicly available molecular datasets are, however, missing this information or only share self-reported race and ethnicity, representing a limitation to identify and repurpose datasets to investigate the contribution of ancestry to diseases and traits. We propose an analytical framework to enrich the metadata from publicly available cohorts with genetic ancestry information and a resulting diversity score at continental resolution, calculated directly from the data. We illustrate this framework using The Cancer Genome Atlas datasets searched through the DataMed Data Discovery Index. Data repositories and contributors can use this framework to provide genetic diversity measurements for controlled access datasets, minimizing the work involved in requesting a dataset that may ultimately prove inadequate for a researcher's purpose. With the increasing global scale of human genetics research, studies on disease risk and susceptibility would benefit greatly from the adequate estimation and sharing of genetic diversity in publicly available datasets following a framework such as the one presented.

SUBMITTER: Harismendy O 

PROVIDER: S-EPMC6433181 | biostudies-literature | 2019 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluating and sharing global genetic ancestry in biomedical datasets.

Harismendy Olivier O   Kim Jihoon J   Xu Xiaojun X   Ohno-Machado Lucila L  

Journal of the American Medical Informatics Association : JAMIA 20190501 5


Genetic ancestry is a critical co-factor to study phenotype-genotype associations using cohorts of human subjects. Most publicly available molecular datasets are, however, missing this information or only share self-reported race and ethnicity, representing a limitation to identify and repurpose datasets to investigate the contribution of ancestry to diseases and traits. We propose an analytical framework to enrich the metadata from publicly available cohorts with genetic ancestry information an  ...[more]

Similar Datasets

| S-EPMC2655927 | biostudies-literature
| S-EPMC10441317 | biostudies-literature
| S-EPMC10359386 | biostudies-literature
| S-EPMC5704547 | biostudies-literature
| S-EPMC4445803 | biostudies-literature
| S-EPMC7771964 | biostudies-literature
| S-EPMC2651580 | biostudies-literature
| S-EPMC8855358 | biostudies-literature
| S-EPMC5497850 | biostudies-other
| S-EPMC10370213 | biostudies-literature