Unknown

Dataset Information

0

PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects.


ABSTRACT: MOTIVATION:Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. RESULTS:In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (?4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. AVAILABILITY AND IMPLEMENTATION:PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Gurinovich A 

PROVIDER: S-EPMC6735784 | biostudies-literature | 2019 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects.

Gurinovich Anastasia A   Bae Harold H   Farrell John J JJ   Andersen Stacy L SL   Monti Stefano S   Puca Annibale A   Atzmon Gil G   Barzilai Nir N   Perls Thomas T TT   Sebastiani Paola P  

Bioinformatics (Oxford, England) 20190901 17


<h4>Motivation</h4>Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery.<h4>Resu  ...[more]

Similar Datasets

2021-05-01 | GSE164942 | GEO
| S-EPMC8323023 | biostudies-literature
2021-05-01 | GSE164938 | GEO
2021-05-01 | GSE164876 | GEO
2021-05-01 | GSE164870 | GEO
| PRJNA692488 | ENA
| S-EPMC7846078 | biostudies-literature
| S-EPMC8896389 | biostudies-literature
| PRJNA692489 | ENA
| S-EPMC2757009 | biostudies-literature