Unknown

Dataset Information

0

Fast spatial ancestry via flexible allele frequency surfaces.


ABSTRACT: Unique modeling and computational challenges arise in locating the geographic origin of individuals based on their genetic backgrounds. Single-nucleotide polymorphisms (SNPs) vary widely in informativeness, allele frequencies change non-linearly with geography and reliable localization requires evidence to be integrated across a multitude of SNPs. These problems become even more acute for individuals of mixed ancestry. It is hardly surprising that matching genetic models to computational constraints has limited the development of methods for estimating geographic origins. We attack these related problems by borrowing ideas from image processing and optimization theory. Our proposed model divides the region of interest into pixels and operates SNP by SNP. We estimate allele frequencies across the landscape by maximizing a product of binomial likelihoods penalized by nearest neighbor interactions. Penalization smooths allele frequency estimates and promotes estimation at pixels with no data. Maximization is accomplished by a minorize-maximize (MM) algorithm. Once allele frequency surfaces are available, one can apply Bayes' rule to compute the posterior probability that each pixel is the pixel of origin of a given person. Placement of admixed individuals on the landscape is more complicated and requires estimation of the fractional contribution of each pixel to a person's genome. This estimation problem also succumbs to a penalized MM algorithm.We applied the model to the Population Reference Sample (POPRES) data. The model gives better localization for both unmixed and admixed individuals than existing methods despite using just a small fraction of the available SNPs. Computing times are comparable with the best competing software.Software will be freely available as the OriGen package in R.ranolaj@uw.edu or klange@ucla.eduSupplementary data are available at Bioinformatics online.

SUBMITTER: Ranola JM 

PROVIDER: S-EPMC4184261 | biostudies-literature | 2014 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Fast spatial ancestry via flexible allele frequency surfaces.

Rañola John Michael JM   Novembre John J   Lange Kenneth K  

Bioinformatics (Oxford, England) 20140709 20


<h4>Motivation</h4>Unique modeling and computational challenges arise in locating the geographic origin of individuals based on their genetic backgrounds. Single-nucleotide polymorphisms (SNPs) vary widely in informativeness, allele frequencies change non-linearly with geography and reliable localization requires evidence to be integrated across a multitude of SNPs. These problems become even more acute for individuals of mixed ancestry. It is hardly surprising that matching genetic models to co  ...[more]

Similar Datasets

| S-EPMC4937201 | biostudies-literature
| S-EPMC8324296 | biostudies-literature
| S-EPMC4301802 | biostudies-literature
| S-EPMC8263560 | biostudies-literature
| S-EPMC6668412 | biostudies-literature
2021-05-06 | E-MTAB-10349 | biostudies-arrayexpress
| S-EPMC3587605 | biostudies-literature
| S-EPMC7542712 | biostudies-literature
| S-EPMC6538377 | biostudies-literature
| S-EPMC8402441 | biostudies-literature