Dataset Information

Probabilistic models of genetic variation in structured populations applied to global human studies.

ABSTRACT:

Motivation

Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes without requiring a higher-level admixture interpretation.

Results

We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis can be utilized to estimate a general model that includes the well-known Pritchard-Stephens-Donnelly admixture model as a special case. Noting some drawbacks of this approach, we introduce a new 'logistic factor analysis' framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions.

Availability and implementation

A Bioconductor R package called lfa is available at http://www.bioconductor.org/packages/release/bioc/html/lfa.html

Contact

jstorey@princeton.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Hao W

PROVIDER: S-EPMC4795615 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Probabilistic models of genetic variation in structured populations applied to global human studies.

Hao Wei W Song Minsun M Storey John D JD

Bioinformatics (Oxford, England) 20151106 5

<h4>Motivation</h4>Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes w ...[more]

PMID: 26545820

Dataset Information

Probabilistic models of genetic variation in structured populations applied to global human studies.

Motivation

Results

Availability and implementation

Contact

Supplementary information

Publications

Probabilistic models of genetic variation in structured populations applied to global human studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Probabilistic models for neural populations that naturally capture global coupling and criticality.
| S-EPMC5621705 | biostudies-other

Scaling probabilistic models of genetic variation to millions of humans.
| S-EPMC5127768 | biostudies-literature

Matching strategies for genetic association studies in structured populations.
| S-EPMC1181929 | biostudies-literature

Clonal interference, genetic variation and the speed of evolution in structured populations.
| S-EPMC11870626 | biostudies-literature

A global reference for human genetic variation.
| S-EPMC4750478 | biostudies-literature

Characterization of genetic variation and natural selection at the arylamine N-acetyltransferase genes in global human populations.
| S-EPMC4653814 | biostudies-literature

Assessment of Genetic Heterogeneity in Structured Plant Populations Using Multivariate Whole-Genome Regression Models.
| S-EPMC4566272 | biostudies-literature

Genetic variation of pre-mRNA alternative splicing in human populations.
| S-EPMC3339278 | biostudies-literature

Integrating common and rare genetic variation in diverse human populations.
| S-EPMC3173859 | biostudies-literature

Scalable probabilistic PCA for large-scale genetic variation data.
| S-EPMC7286535 | biostudies-literature