Project description:Aggregate results from genome-wide association studies (GWAS), such as genotype frequencies for cases and controls, were until recently often made available on public websites because they were thought to disclose negligible information concerning an individual's participation in a study. Homer et al. recently suggested that a method for forensic detection of an individual's contribution to an admixed DNA sample could be applied to aggregate GWAS data. Using a likelihood-based statistical framework, we developed an improved statistic that uses genotype frequencies and individual genotypes to infer whether a specific individual or any close relatives participated in the GWAS and, if so, what the participant's phenotype status is. Our statistic compares the logarithm of genotype frequencies, in contrast to that of Homer et al., which is based on differences in either SNP probe intensity or allele frequencies. We derive the theoretical power of our test statistics and explore the empirical performance in scenarios with varying numbers of randomly chosen or top-associated SNPs.
Project description:Cells consist of molecular modules which perform vital biological functions. Cellular modules are key units of adaptive evolution because organismal fitness depends on their performance. Theory shows that in rapidly evolving populations, such as those of many microbes, adaptation is driven primarily by common beneficial mutations with large effects, while other mutations behave as if they are effectively neutral. As a consequence, if a module can be improved only by rare and/or weak beneficial mutations, its adaptive evolution would stall. However, such evolutionary stalling has not been empirically demonstrated, and it is unclear to what extent stalling may limit the power of natural selection to improve modules. Here we empirically characterize how natural selection improves the translation machinery (TM), an essential cellular module. We experimentally evolved populations of Escherichia coli with genetically perturbed TMs for 1,000 generations. Populations with severe TM defects initially adapted via mutations in the TM, but TM adaptation stalled within about 300 generations. We estimate that the genetic load in our populations incurred by residual TM defects ranges from 0.5 to 19%. Finally, we found evidence that both epistasis and the depletion of the pool of beneficial mutations contributed to evolutionary stalling. Our results suggest that cellular modules may not be fully optimized by natural selection despite the availability of adaptive mutations.
Project description:Comparing allele frequencies among populations that differ in environment has long been a tool for detecting loci involved in local adaptation. However, such analyses are complicated by an imperfect knowledge of population allele frequencies and neutral correlations of allele frequencies among populations due to shared population history and gene flow. Here we develop a set of methods to robustly test for unusual allele frequency patterns and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv. Using this model, we calculate a set of "standardized allele frequencies" that allows investigators to apply tests of their choice to multiple populations while accounting for sampling and covariance due to population history. We illustrate this first by showing that these standardized frequencies can be used to detect nonparametric correlations with environmental variables; these correlations are also less prone to spurious results due to outlier populations. We then demonstrate how these standardized allele frequencies can be used to construct a test to detect SNPs that deviate strongly from neutral population structure. This test is conceptually related to FST and is shown to be more powerful, as we account for population history. We also extend the model to next-generation sequencing of population pools-a cost-efficient way to estimate population allele frequencies, but one that introduces an additional level of sampling noise. The utility of these methods is demonstrated in simulations and by reanalyzing human SNP data from the Human Genome Diversity Panel populations and pooled next-generation sequencing data from Atlantic herring. An implementation of our method is available from http://gcbias.org.
Project description:The impact of the highly polymorphic Killer-cell immunoglobulin-like receptor (KIR) gene cluster on the outcome of hematopoietic stem cell transplantation (HCST) is subject of current research. To further understand the involvement of this gene family into Natural Killer (NK) cell-mediated graft-versus-leukemia reactions, knowledge of haplotype structures, and allelic linkage is of importance. In this analysis, we estimate population-specific KIR haplotype frequencies at allele group resolution in a cohort of n = 458 German families. We addressed the polymorphism of the KIR gene complex and phasing ambiguities by a combined approach. Haplotype inference within first-degree family relations allowed us to limit the number of possible diplotypes. Structural restriction to a pattern set of 92 previously described KIR copy number haplotypes further reduced ambiguities. KIR haplotype frequency estimation was finally accomplished by means of an expectation-maximization algorithm. Applying a resolution threshold of ½ n, we were able to identify a set of 551 KIR allele group haplotypes, representing 21 KIR copy number haplotypes. The haplotype frequencies allow studying linkage disequilibrium in two-locus as well as in multi-locus analyses. Our study reveals associations between KIR haplotype structures and allele group frequencies, thereby broadening our understanding of the KIR gene complex.
Project description:Many processes of biological diversification can simultaneously affect multiple evolutionary lineages. Examples include multiple members of a gene family diverging when a region of a chromosome is duplicated, multiple viral strains diverging at a "super-spreading" event, and a geological event fragmenting whole communities of species. It is difficult to test for patterns of shared divergences predicted by such processes because all phylogenetic methods assume that lineages diverge independently. We introduce a Bayesian phylogenetic approach to relax the assumption of independent, bifurcating divergences by expanding the space of topologies to include trees with shared and multifurcating divergences. This allows us to jointly infer phylogenetic relationships, divergence times, and patterns of divergences predicted by processes of diversification that affect multiple evolutionary lineages simultaneously or lead to more than two descendant lineages. Using simulations, we find that the method accurately infers shared and multifurcating divergence events when they occur and performs as well as current phylogenetic methods when divergences are independent and bifurcating. We apply our approach to genomic data from two genera of geckos from across the Philippines to test if past changes to the islands' landscape caused bursts of speciation. Unlike previous analyses restricted to only pairs of gecko populations, we find evidence for patterns of shared divergences. By generalizing the space of phylogenetic trees in a way that is independent from the likelihood model, our approach opens many avenues for future research into processes of diversification across the life sciences.
Project description:Although radiotherapy plays a crucial role in treating many cancers, the effect of radiation on tumor evolution remains unclear. We integrated temporal genomic profiling of 120 spatially distinct tumor regions from 20 patients with undifferentiated pleomorphic sarcomas (UPS), longitudinal circulating tumor DNA (ctDNA) analysis, and evolutionary biology computational pipelines to study UPS evolution during tumorigenesis and in response to radiotherapy. Most unirradiated UPS displayed initial linear evolution followed by subsequent branching evolution with distinct mutational processes during early and late development. Using metrics of genetic divergence between regions, we demonstrated evidence of strong selection pressures during UPS development that further increased during radiotherapy. We observed changes in subclone abundance following radiotherapy with subclone contraction tied to alterations in calcium signaling and demonstrated that inhibiting calcium transporters can radiosensitize sarcoma cells. Finally, ctDNA analysis accurately measured subclone abundance and enabled non-invasive monitoring of subclonal changes. These results demonstrate that radiation exerts selective pressures on UPS and suggest that targeting radioresistant subclonal populations could improve outcomes after radiotherapy.
Project description:In malaria, individuals are often infected with different parasite strains. The complexity of infection (COI) is defined as the number of genetically distinct parasite strains in an individual. Changes in the mean COI in a population have been shown to be informative of changes in transmission intensity with a number of probabilistic likelihood and Bayesian models now developed to estimate the COI. However, rapid, direct measures based on heterozygosity or FwS do not properly represent the COI. In this work, we present two new methods that use easily calculated measures to directly estimate the COI from allele frequency data. Using a simulation framework, we show that our methods are computationally efficient and comparably accurate to current approaches in the literature. Through a sensitivity analysis, we characterize how the distribution of parasite densities, the assumed sequencing depth, and the number of sampled loci impact the bias and accuracy of our two methods. Using our developed methods, we further estimate the COI globally from Plasmodium falciparum sequencing data and compare the results against the literature. We show significant differences in the estimated COI globally between continents and a weak relationship between malaria prevalence and COI.
Project description:BackgroundObesity is emerging as a global health problem, with more than one-third of the world's adult population being overweight or obese. In this study, we investigated worldwide population differentiation in allele frequencies of obesity-associated SNPs (single nucleotide polymorphisms).ResultsWe collected a total of 225 obesity-associated SNPs from a public database. Their population-level allele frequencies were derived based on the genotype data from 1000 Genomes Project (phase 3). We used hypergeometric model to assess whether the effect allele at a given SNP is significantly enriched or depleted in each of the 26 populations surveyed in the 1000 Genomes Project with respect to the overall pooled population. Our results indicate that 195 out of 225 SNPs (86.7%) possess effect alleles significantly enriched or depleted in at least one of the 26 populations. Populations within the same continental group exhibit similar allele enrichment/depletion patterns whereas inter-continental populations show distinct patterns. Among the 225 SNPs, 15 SNPs cluster in the first intron region of the FTO gene, which is a major gene associated with body-mass index (BMI) and fat mass. African populations exhibit much smaller blocks of LD (linkage disequilibrium) among these15 SNPs while European and Asian populations have larger blocks. To estimate the cumulative effect of all variants associated with obesity, we developed the personal composite genetic risk score for obesity. Our results indicate that the East Asian populations have the lowest averages of the composite risk scores, whereas three European populations have the highest averages. In addition, the population-level average of composite genetic risk scores is significantly correlated (R2 = 0.35, P = 0.0060) with obesity prevalence.ConclusionsWe have detected substantial population differentiation in allele frequencies of obesity-associated SNPs. The results will help elucidate the genetic basis which may contribute to population disparities in obesity prevalence.
Project description:Scorpion toxins are thought to have originated from ancestral housekeeping genes that underwent diversification and neofunctionalization, as a result of positive selection. Our understanding of the evolutionary origin of these peptides is hindered by the patchiness of existing taxonomic sampling. While recent studies have shown phylogenetic inertia in some scorpion toxins at higher systematic levels, evolutionary dynamics of toxins among closely related taxa remain unexplored. In this study, we used new and previously published transcriptomic resources to assess evolutionary relationships of closely related scorpions from the family Hadruridae and their toxins. In addition, we surveyed the incidence of scorpine-like peptides (SLP, a type of potassium channel toxin), which were previously known from 21 scorpion species. We demonstrate that scorpine-like peptides exhibit gene duplications. Our molecular analyses demonstrate that only eight sites of two SLP copies found in scorpions are evolving under positive selection, with more sites evolving under negative selection, in contrast to previous findings. These results show evolutionary conservation in toxin diversity at shallow taxonomic scale.
Project description:Natural selection makes evolutionary adaptation possible even if the overwhelming majority of new mutations are deleterious. However, in rapidly evolving populations where numerous linked mutations occur and segregate simultaneously, clonal interference and genetic hitchhiking can limit the efficiency of selection, allowing deleterious mutations to accumulate over time. This can in principle overwhelm the fitness increases provided by beneficial mutations, leading to an overall fitness decline. Here, we analyze the conditions under which evolution will tend to drive populations to higher versus lower fitness. Our analysis focuses on quantifying the boundary between these two regimes, as a function of parameters such as population size, mutation rates, and selection pressures. This boundary represents a state in which adaptation is precisely balanced by Muller's ratchet, and we show that it can be characterized by rapid molecular evolution without any net fitness change. Finally, we consider the implications of global fitness-mediated epistasis, and find that under some circumstances this can drive populations towards the boundary state, which can thus represent a long-term evolutionary attractor.