Unknown

Dataset Information

0

POPSTR: Inference of Admixed Population Structure Based on Single-Nucleotide Polymorphisms and Copy Number Variations.


ABSTRACT: Statistical approaches for population structure estimation have been predominantly driven by a particular data type, single-nucleotide polymorphisms (SNPs). However, in the presence of weak identifiability in SNPs, population structure estimation can suffer from undesirable accuracy loss. Copy number variations (CNVs) are genomic structural variants with loci that are commonly shared within a specific population and thus provide valuable information for estimation of the ancestry of sampled populations. We develop a Bayesian joint modeling framework of SNPs and CNVs, called POPSTR, to better understand population structure than approaches that use SNPs solely. To deal with the increased data volume, we use the Metropolis Adjusted Langevin algorithm (MALA) that guides the target distribution in a computationally efficient way. We illustrate applications of our approach using the HapMap 2005 project data. We carry out simulation studies and show that the performance of our approach is comparable or better than that of popular benchmarks, STRUCTURE and ADMIXTURE. We also observe that using only CNVs can be remarkably efficient if SNP data are not available.

SUBMITTER: Ahn J 

PROVIDER: S-EPMC5915226 | biostudies-literature | 2018 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

POPSTR: Inference of Admixed Population Structure Based on Single-Nucleotide Polymorphisms and Copy Number Variations.

Ahn Jaeil J   Conkright Brian B   Boca Simina M SM   Madhavan Subha S  

Journal of computational biology : a journal of computational molecular cell biology 20180102 4


Statistical approaches for population structure estimation have been predominantly driven by a particular data type, single-nucleotide polymorphisms (SNPs). However, in the presence of weak identifiability in SNPs, population structure estimation can suffer from undesirable accuracy loss. Copy number variations (CNVs) are genomic structural variants with loci that are commonly shared within a specific population and thus provide valuable information for estimation of the ancestry of sampled popu  ...[more]

Similar Datasets

| S-EPMC4986450 | biostudies-literature
| S-EPMC3917743 | biostudies-literature
| S-EPMC2600609 | biostudies-literature
| S-EPMC2892635 | biostudies-other
| S-EPMC11353365 | biostudies-literature
| S-EPMC3600412 | biostudies-literature
| S-EPMC2859865 | biostudies-other
| S-EPMC9236577 | biostudies-literature
| S-EPMC2227926 | biostudies-literature
2011-02-01 | GSE26450 | GEO