Project description:Rapid advances in short-read DNA sequencing technologies have revolutionized population genomic studies, but there are genomic regions where this technology reaches its limits. Limitations mostly arise due to the difficulties in assembly or alignment to genomic regions of high sequence divergence and high repeat content, which are typical characteristics for loci under strong long-term balancing selection. Studying genetic diversity at such loci therefore remains challenging. Here, we investigate the feasibility and error rates associated with targeted long-read sequencing of a locus under balancing selection. For this purpose, we generated bacterial artificial chromosomes (BACs) containing the Brassicaceae S-locus, a region under strong negative frequency-dependent selection which has previously proven difficult to assemble in its entirety using short reads. We sequence S-locus BACs with single-molecule long-read sequencing technology and conduct de novo assembly of these S-locus haplotypes. By comparing repeated assemblies resulting from independent long-read sequencing runs on the same BAC clone we do not detect any structural errors, suggesting that reliable assemblies are generated, but we estimate an indel error rate of 5.7×10-5 A similar error rate was estimated based on comparison of Illumina short-read sequences and BAC assemblies. Our results show that, until de novo assembly of multiple individuals using long-read sequencing becomes feasible, targeted long-read sequencing of loci under balancing selection is a viable option with low error rates for single nucleotide polymorphisms or structural variation. We further find that short-read sequencing is a valuable complement, allowing correction of the relatively high rate of indel errors that result from this approach.
Project description:Genetic drift is expected to remove polymorphism from populations over long periods of time, with the rate of polymorphism loss being accelerated when species experience strong reductions in population size. Adaptive forces that maintain genetic variation in populations, or balancing selection, might counteract this process. To understand the extent to which natural selection can drive the retention of genetic diversity, we document genomic variability after two parallel species-wide bottlenecks in the genus Capsella. We find that ancestral variation preferentially persists at immunity related loci, and that the same collection of alleles has been maintained in different lineages that have been separated for several million years. By reconstructing the evolution of the disease-related locus MLO2b, we find that divergence between ancient haplotypes can be obscured by referenced based re-sequencing methods, and that trans-specific alleles can encode substantially diverged protein sequences. Our data point to long-term balancing selection as an important factor shaping the genetics of immune systems in plants and as the predominant driver of genomic variability after a population bottleneck.
Project description:Balancing selection is an important evolutionary force that maintains genetic and phenotypic diversity in populations. Most studies in humans have focused on long-standing balancing selection, which persists over long periods of time and is generally shared across populations. But balanced polymorphisms can also promote fast adaptation, especially when the environment changes. To better understand the role of previously balanced alleles in novel adaptations, we analyzed in detail four loci as case examples of this mechanism. These loci show hallmark signatures of long-term balancing selection in African populations, but not in Eurasian populations. The disparity between populations is due to changes in allele frequencies, with intermediate frequency alleles in Africans (likely due to balancing selection) segregating instead at low- or high-derived allele frequency in Eurasia. We explicitly tested the support for different evolutionary models with an approximate Bayesian computation approach and show that the patterns in PKDREJ, SDR39U1, and ZNF473 are best explained by recent changes in selective pressure in certain populations. Specifically, we infer that alleles previously under long-term balancing selection, or alleles linked to them, were recently targeted by positive selection in Eurasian populations. Balancing selection thus likely served as a source of functional alleles that mediated subsequent adaptations to novel environments.
Project description:Balancing selection occurs when multiple alleles are maintained in a population, which can result in their preservation over long evolutionary time periods. A characteristic signature of this long-term balancing selection is an excess number of intermediate frequency polymorphisms near the balanced variant. However, the expected distribution of allele frequencies at these loci has not been extensively detailed, and therefore existing summary statistic methods do not explicitly take it into account. Using simulations, we show that new mutations which arise in close proximity to a site targeted by balancing selection accumulate at frequencies nearly identical to that of the balanced allele. In order to scan the genome for balancing selection, we propose a new summary statistic, β, which detects these clusters of alleles at similar frequencies. Simulation studies show that compared with existing summary statistics, our measure has improved power to detect balancing selection, and is reasonably powered in non-equilibrium demographic models and under a range of recombination and mutation rates. We compute β on 1000 Genomes Project data to identify loci potentially subjected to long-term balancing selection in humans. We report two balanced haplotypes-localized to the genes WFS1 and CADM2-that are strongly linked to association signals for complex traits. Our approach is computationally efficient and applicable to species that lack appropriate outgroup sequences, allowing for well-powered analysis of selection in the wide variety of species for which population data are rapidly being generated.
Project description:Balancing selection maintains advantageous diversity in populations through various mechanisms. Although extensively explored from a theoretical perspective, an empirical understanding of its prevalence and targets lags behind our knowledge of positive selection. Here, we describe the Non-central Deviation (NCD), a simple yet powerful statistic to detect long-term balancing selection (LTBS) that quantifies how close frequencies are to expectations under LTBS, and provides the basis for a neutrality test. NCD can be applied to a single locus or genomic data, and can be implemented considering only polymorphisms (NCD1) or also considering fixed differences with respect to an outgroup (NCD2) species. Incorporating fixed differences improves power, and NCD2 has higher power to detect LTBS in humans under different frequencies of the balanced allele(s) than other available methods. Applied to genome-wide data from African and European human populations, in both cases using chimpanzee as an outgroup, NCD2 shows that, albeit not prevalent, LTBS affects a sizable portion of the genome: ∼0.6% of analyzed genomic windows and 0.8% of analyzed positions. Significant windows (P < 0.0001) contain 1.6% of SNPs in the genome, which disproportionally fall within exons and change protein sequence, but are not enriched in putatively regulatory sites. These windows overlap ∼8% of the protein-coding genes, and these have larger number of transcripts than expected by chance even after controlling for gene length. Our catalog includes known targets of LTBS but a majority of them (90%) are novel. As expected, immune-related genes are among those with the strongest signatures, although most candidates are involved in other biological functions, suggesting that LTBS potentially influences diverse human phenotypes.
Project description:BackgroundThe ability to grow in phosphorus-depleted soils is an important trait for rice cultivation in many world regions, especially in the tropics. The Phosphorus Starvation Tolerance 1 (PSTOL1) gene has been identified as underlying the ability of some cultivated rice varieties to grow under low-phosphorus conditions; however, the gene is absent from other varieties. We assessed PSTOL1 presence/absence in a geographically diverse sample of wild, domesticated and weedy rice and sequenced the gene in samples where it is present.ResultsWe find that the presence/absence polymorphism spans cultivated, weedy and wild Asian rice groups. For the subset of samples that carry PSTOL1, haplotype sequences suggest long-term selective maintenance of functional alleles, but with repeated evolution of loss-of-function alleles through premature stops and frameshift mutations. The loss-of-function alleles have evolved convergently in multiple rice species and cultivated rice varieties. Greenhouse assessments of plant growth under low- and high-phosphorus conditions did not reveal significant associations with PSTOL1 genotype variation; however, the striking signature of balancing selection at this locus suggests that further phenotypic characterizations of PSTOL1 allelic variants is warranted and may be useful for crop improvement.ConclusionsThese findings suggest balancing selection for both functional and non-functional PSTOL1 alleles that predates and transcends Asian rice domestication, a pattern that may reflect fitness tradeoffs associated with geographical variation in soil phosphorus content.
Project description:BackgroundIn contrast to positive selection, which reduces genetic variation by fixing beneficial alleles, balancing selection maintains genetic variation within a population or species and plays crucial roles in adaptation in diverse organisms. However, which genes, genome-wide, are under balancing selection and the extent to which these genes are involved in adaptation are largely unknown.ResultsWe performed a genome-wide scan for genes under balancing selection across two plant species, Arabidopsis thaliana and its relative Capsella rubella, which diverged about 8 million generations ago. Among hundreds of genes with shared coding-region polymorphisms, we find evidence for long-term balancing selection in five genes: AT1G35220, AT2G16570, AT4G29360, AT5G38460, and AT5G44000. These genes are involved in the response to biotic and abiotic stress and other fundamental biochemical processes. More intriguingly, for these genes, we detected significant ecological diversification between the two haplotype groups, suggesting that balancing selection has been very important for adaptation.ConclusionsOur results indicate that beyond the well-known S-locus genes and resistance genes, many loci are under balancing selection. These genes are mostly correlated with resistance to stress or other fundamental functions and likely play a more important role in adaptation to diverse habitats than previously thought.