Dataset Information

Whole population, genome-wide mapping of hidden relatedness.

ABSTRACT: We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on a dictionary of haplotypes that is used to efficiently discover short exact matches between individuals. We then expand these matches using dynamic programming to identify long, nearly identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We bolster these results by demonstrating novel applications of precise analysis of hidden relatedness for (1) identification and resolution of phasing errors and (2) exposing polymorphic deletions that are otherwise challenging to detect. This finding is supported by concordance of detected deletions with other evidence from independent databases and statistical analyses of fluorescence intensity not used by GERMLINE.

SUBMITTER: Gusev A

PROVIDER: S-EPMC2652213 | biostudies-literature | 2009 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Whole population, genome-wide mapping of hidden relatedness.

Gusev Alexander A Lowe Jennifer K JK Stoffel Markus M Daly Mark J MJ Altshuler David D Breslow Jan L JL Friedman Jeffrey M JM Pe'er Itsik I

Genome research 20081029 2

We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on a dictionary of haplotypes that is used to efficiently discover short exact matches between individuals. We then expand these matches using dynamic programming to identify long, nea ...[more]

PMID: 18971310

Similar Datasets

Project description:Since most analysis software for genome-wide association studies (GWAS) currently exploit only unrelated individuals, there is a need for efficient applications that can handle general pedigree data or mixtures of both population and pedigree data. Even datasets thought to consist of only unrelated individuals may include cryptic relationships that can lead to false positives if not discovered and controlled for. In addition, family designs possess compelling advantages. They are better equipped to detect rare variants, control for population stratification, and facilitate the study of parent-of-origin effects. Pedigrees selected for extreme trait values often segregate a single gene with strong effect. Finally, many pedigrees are available as an important legacy from the era of linkage analysis. Unfortunately, pedigree likelihoods are notoriously hard to compute. In this paper, we reexamine the computational bottlenecks and implement ultra-fast pedigree-based GWAS analysis. Kinship coefficients can either be based on explicitly provided pedigrees or automatically estimated from dense markers. Our strategy (a) works for random sample data, pedigree data, or a mix of both; (b) entails no loss of power; (c) allows for any number of covariate adjustments, including correction for population stratification; (d) allows for testing SNPs under additive, dominant, and recessive models; and (e) accommodates both univariate and multivariate quantitative traits. On a typical personal computer (six CPU cores at 2.67 GHz), analyzing a univariate HDL (high-density lipoprotein) trait from the San Antonio Family Heart Study (935,392 SNPs on 1,388 individuals in 124 pedigrees) takes less than 2 min and 1.5 GB of memory. Complete multivariate QTL analysis of the three time-points of the longitudinal HDL multivariate trait takes less than 5 min and 1.5 GB of memory. The algorithm is implemented as the Ped-GWAS Analysis (Option 29) in the Mendel statistical genetics package, which is freely available for Macintosh, Linux, and Windows platforms from http://genetics.ucla.edu/software/mendel.

Project description:BackgroundCultivated tomato (Solanum lycopersicum L.) is the second most important vegetable crop after potato and a member of thirteen interfertile species of Solanum genus. Domestication and continuous selection for desirable traits made cultivated tomato species susceptible to many stresses as compared to the wild species. In this study, we analyzed and compared the genomes of wild and cultivated tomato accessions to identify the genomic regions that encountered changes during domestication.ResultsAnalysis was based on SNP and InDel mining of twentynine accessions of twelve wild tomato species and forty accessions of cultivated tomato. Percentage of common SNPs among the accessions within a species corresponded with the reproductive behavior of the species. SNP profiles of the wild tomato species within a phylogenetic subsection varied with their geographical distribution. Interestingly, the ratio of genic SNP to total SNPs increased with phylogenetic distance of the wild tomato species from the domesticated species, suggesting that variations in gene-coding region play a major role in speciation. We retrieved 2439 physical positions in 1594 genes including 32 resistance related genes where all the wild accessions possessed a common wild variant allele different from all the cultivated accessions studied. Tajima's D analysis predicted a very strong purifying selection associated with domestication in nearly 1% of its genome, half of which is contributed by chromosome 11. This genomic region with a low Tajima's D value hosts a variety of genes associated with important agronomic trait such as, fruit size, tiller number and wax deposition.ConclusionOur analysis revealed a broad-spectrum genetic base in wild tomato species and erosion of that in cultivated tomato due to recurrent selection for agronomically important traits. Identification of the common wild variant alleles and the genomic regions undergoing purifying selection during cultivation would facilitate future breeding program by introgression from wild species.

Dataset Information

Whole population, genome-wide mapping of hidden relatedness.

Publications

Whole population, genome-wide mapping of hidden relatedness.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets