Unknown

Dataset Information

0

Accurate inference of subtle population structure (and other genetic discontinuities) using principal coordinates.


ABSTRACT:

Background

Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that may not be met in real data sets. Here we describe the development of a new approach, PCO-MC, which couples principal coordinate analysis to a clustering procedure for the inference of population structure from multilocus genotype data.

Methodology/principal findings

PCO-MC uses data from all principal coordinate axes simultaneously to calculate a multidimensional "density landscape", from which the number of subpopulations, and the membership within subpopulations, is determined using a valley-seeking algorithm. Using extensive simulations, we show that this approach outperforms a Bayesian MCMC procedure when many loci (e.g. 100) are sampled, but that the Bayesian procedure is marginally superior with few loci (e.g. 10). When presented with sufficient data, PCO-MC accurately delineated subpopulations with population F(st) values as low as 0.03 (G'(st)>0.2), whereas the limit of resolution of the Bayesian approach was F(st) = 0.05 (G'(st)>0.35).

Conclusions/significance

We draw a distinction between population structure inference for describing biodiversity as opposed to Type I error control in associative genetics. We suggest that discrete assignments, like those produced by PCO-MC, are appropriate for circumscribing units of biodiversity whereas expression of population structure as a continuous variable is more useful for case-control correction in structured association studies.

SUBMITTER: Reeves PA 

PROVIDER: S-EPMC2625398 | biostudies-literature | 2009

REPOSITORIES: biostudies-literature

altmetric image

Publications

Accurate inference of subtle population structure (and other genetic discontinuities) using principal coordinates.

Reeves Patrick A PA   Richards Christopher M CM  

PloS one 20090127 1


<h4>Background</h4>Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that ma  ...[more]

Similar Datasets

| S-EPMC7643466 | biostudies-literature
| S-EPMC3266881 | biostudies-literature
| S-EPMC4224995 | biostudies-literature
| S-EPMC2912642 | biostudies-literature
2010-08-16 | E-GEOD-23636 | biostudies-arrayexpress
2010-08-16 | GSE23636 | GEO
| S-EPMC3024871 | biostudies-other
| S-EPMC5644186 | biostudies-literature
| S-EPMC3129529 | biostudies-literature
| S-EPMC5856401 | biostudies-literature