Dataset Information

Fast admixture analysis and population tree estimation for SNP and NGS data.

ABSTRACT: Motivation:Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components. Contribution:We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also applicable to models based on genotype likelihoods, that can account for the uncertainty in genotype-calling associated with Next Generation Sequencing (NGS) data. We also present a new method for estimating population trees from ancestry components using a Gaussian approximation. Using coalescence simulations of diverging populations, we explore the adequacy of the STRUCTURE-style models and the Gaussian assumption for identifying ancestry components correctly and for inferring the correct tree. In most cases, ancestry components are inferred correctly, although sample sizes and times since admixture can influence the results. We show that the popular Gaussian approximation tends to perform poorly under extreme divergence scenarios e.g. with very long branch lengths, but the topologies of the population trees are accurately inferred in all scenarios explored. The new methods are implemented together with appropriate visualization tools in the software package Ohana. Availability and Implementation:Ohana is publicly available at https://github.com/jade-cheng/ohana . In addition to source code and installation instructions, we also provide example work-flows in the project wiki site. Contact:jade.cheng@birc.au.dk. Supplementary information:Supplementary data are available at Bioinformatics online.

SUBMITTER: Cheng JY

PROVIDER: S-EPMC6543773 | biostudies-literature | 2017 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fast admixture analysis and population tree estimation for SNP and NGS data.

Cheng Jade Yu JY Mailund Thomas T Nielsen Rasmus R

Bioinformatics (Oxford, England) 20170701 14

<h4>Motivation</h4>Structure methods are highly used population genetic methods for classifying individuals in a sample fractionally into discrete ancestry components.<h4>Contribution</h4>We introduce a new optimization algorithm for the classical STRUCTURE model in a maximum likelihood framework. Using analyses of real data we show that the new method finds solutions with higher likelihoods than the state-of-the-art method in the same computational time. The optimization algorithm is also appli ...[more]

PMID: 28334108

Dataset Information

Fast admixture analysis and population tree estimation for SNP and NGS data.

Publications

Fast admixture analysis and population tree estimation for SNP and NGS data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.
| S-EPMC6216594 | biostudies-literature

Fast Estimation of Recombination Rates Using Topological Data Analysis.
| S-EPMC6456321 | biostudies-literature

The effect of genomic inversions on estimation of population genetic parameters from SNP data.
| S-EPMC3527249 | biostudies-literature

Synggen: fast and data-driven generation of synthetic heterogeneous NGS cancer data.
| S-EPMC9825741 | biostudies-literature

Fine-Scale Population Admixture Landscape of Tai-Kadai-Speaking Maonan in Southwest China Inferred From Genome-Wide SNP Data.
| S-EPMC8891617 | biostudies-literature

Estimation of SNP heritability from dense genotype data.
| S-EPMC3852919 | biostudies-literature

Fast covariance estimation for sparse functional data.
| S-EPMC5807553 | biostudies-literature

Fast Covariance Estimation for Multivariate Sparse Functional Data.
| S-EPMC8276768 | biostudies-literature

Principal components analysis of population admixture.
| S-EPMC3392282 | biostudies-literature

An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data.
| S-EPMC3638139 | biostudies-literature