Unknown

Dataset Information

0

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.


ABSTRACT: We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

SUBMITTER: Meisner J 

PROVIDER: S-EPMC6216594 | biostudies-literature | 2018 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.

Meisner Jonas J   Albrechtsen Anders A  

Genetics 20180821 2


We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for  ...[more]

Similar Datasets

| S-EPMC8796377 | biostudies-literature
| S-EPMC6543773 | biostudies-literature
| S-EPMC3813857 | biostudies-other
| S-EPMC2867064 | biostudies-literature
| S-EPMC9069078 | biostudies-literature
| S-EPMC3826632 | biostudies-literature
2010-04-08 | GSE21248 | GEO
| S-EPMC1448708 | biostudies-literature
2010-04-07 | E-GEOD-21248 | biostudies-arrayexpress
| S-EPMC3907104 | biostudies-literature