Weak selection and recent mutational changes influence polymorphic synonymous mutations in humans.
Ontology highlight
ABSTRACT: Recent large-scale genomic and evolutionary studies have revealed the small but detectable signature of weak selection on synonymous mutations during mammalian evolution, likely acting at the level of translational efficacy (i.e., translational selection). To investigate whether weak selection, and translational selection in particular, plays any role in shaping the fate of synonymous mutations that are present today in human populations, we studied genetic variation at the polymorphic level and patterns of evolution in the human lineage after human-chimpanzee separation. We find evidence that neutral mechanisms are influencing the frequency of polymorphic mutations in humans. Our results suggest a recent increase in mutational tendencies toward AT, observed in all isochores, that is responsible for AT mutations segregating at lower frequencies than GC mutations. In all, however, changes in mutational tendencies and other neutral scenarios are not sufficient to explain a difference between synonymous and noncoding mutations or a difference between synonymous mutations potentially advantageous or deleterious under a translational selection model. Furthermore, several estimates of selection intensity on synonymous mutations all suggest a detectable influence of weak selection acting at the level of translational selection. Thus, random genetic drift, recent changes in mutational tendencies, and weak selection influence the fate of synonymous mutations that are present today as polymorphisms. All of these features, neutral and selective, should be taken into account in evolutionary analyses that often assume constancy of mutational tendencies and complete neutrality of synonymous mutations.
Project description:BackgroundSynonymous mutations are able to change the tAI (tRNA adaptation index) of a codon and consequently affect the local translation rate. Intuitively, one may hypothesize that those synonymous mutations which increase the tAI values are favored by natural selection.ResultsWe use the maize (Zea mays) genome to test our assumption. The first supporting evidence is that the tAI-increasing synonymous mutations have higher fixed-to-polymorphic ratios than the tAI-decreasing ones. Next, the DAF (derived allele frequency) or MAF (minor allele frequency) of the former is significantly higher than the latter. Moreover, similar results are obtained when we investigate CAI (codon adaptation index) instead of tAI.ConclusionThe synonymous mutations in the maize genome are not strictly neutral. The tAI-increasing mutations are positively selected while those tAI-decreasing ones undergo purifying selection. This selection force might be weak but should not be automatically ignored.
Project description:Nucleotide variation in an 8.1-kb fragment encompassing the RpII215 gene, which encodes the largest subunit of the RNA polymerase II complex, is analyzed in a sample of 11 chromosomes from a natural population of Drosophila subobscura. No amino acid polymorphism was detected among the 157 segregating sites. The observed numbers of preferred and unpreferred derived synonymous mutations can be explained by neutral mutational processes. In contrast, preferred mutations segregate at significantly higher frequency than unpreferred mutations, suggesting the action of natural selection. The polymorphism to divergence ratio is different for preferred and unpreferred changes, in agreement with their beneficial and deleterious effects on fitness, respectively. Preferred and unpreferred codons are nonrandomly distributed in the RpII215 gene, leading to a heterogeneous distribution of polymorphic to fixed synonymous differences across this coding region. This intragenic variation of the polymorphism/divergence ratio cannot be explained by different patterns of gene expression, mutation, or recombination rates, and therefore it indicates that selection coefficients for synonymous mutations can vary extensively across a coding region. The application of nucleotide composition stationarity tests in coding and flanking noncoding regions, assumed to behave neutrally, allows the detection of the action of natural selection when stationarity holds in the noncoding region.
Project description:Synonymous codon substitutions are not always selectively neutral as revealed by several types of analyses, including studies of codon usage patterns among genes. We analyzed codon usage in 13 bacterial genomes sampled from across a large order of bacteria, Enterobacterales, and identified presumptively neutral and selected classes of synonymous substitutions. To estimate substitution rates, given a neutral/selected classification of synonymous substitutions, we developed a flexible [Formula: see text] substitution model that allows multiple classes of synonymous substitutions. Under this multiclass synonymous substitution (MSS) model, the denominator of [Formula: see text] includes only the strictly neutral class of synonymous substitutions. On average, the value of [Formula: see text] under the MSS model was 80% of that under the standard codon model in which all synonymous substitutions are assumed to be neutral. The indication is that conventional [Formula: see text] analyses overestimate these values and thus overestimate the frequency of positive diversifying selection and underestimate the strength of purifying selection. To quantify the strength of selection necessary to explain this reduction, we developed a model of selected compensatory codon substitutions. The reduction in synonymous substitution rate, and thus the contribution that selection makes to codon bias variation among genes, can be adequately explained by very weak selection, with a mean product of population size and selection coefficient, [Formula: see text].
Project description:In Drosophila, many studies have examined the short- or long-term evolution occurring across synonymous sites. Few, however, have examined both the recent and long-term evolution to gain a complete view of this selection. Here we have analyzed Drosophila ananassae DNA polymorphism and divergence data using several different methods, and have identified evidence of positive selection favoring preferred codons in both recent and long-term evolutionary time scale. Further in D. ananassae, the strength of selection for preferred codons was stronger on the X chromosome compared to the autosomes. We show that this stronger selection is not due to higher gene expression of X-linked genes. Analysis of the selectively neutral introns indicated that the X chromosome also had a preference for GC over AT nucleotides, potentially from GC-biased gene conversions (gcBGCs) that can also affect the base composition of synonymous sites. Thus selection for preferred codons and gcBGC both seem to be partially responsible for shaping the D. ananassae synonymous site evolution.
Project description:Synonymous mutations are considered to be "silent" as they do not affect protein sequence. However, different silent codons have different translation efficiency (TE), which raises the question to what extent such mutations are really neutral. We perform the first genome-wide study of natural selection operating on TE in recent human evolution, surveying 13,798 synonymous single nucleotide polymorphisms (SNPs) in 1,198 unrelated individuals from 11 populations. We find evidence for both negative and positive selection on TE, as measured based on differentiation in allele frequencies between populations. Notably, the likelihood of an SNP to be targeted by positive or negative selection is correlated with the magnitude of its effect on the TE of the corresponding protein. Furthermore, negative selection acting against changes in TE is more marked in highly expressed genes, highly interacting proteins, complex members, and regulatory genes. It is also more common in functional regions and in the initial segments of highly expressed genes. Positive selection targeting sites with a large effect on TE is stronger in lowly interacting proteins and in regulatory genes. Similarly, essential genes are enriched for negative TE selection while underrepresented for positive TE selection. Taken together, these results point to the significant role of TE as a selective force operating in humans and hence underscore the importance of considering silent SNPs in interpreting associations with complex human diseases. Testifying to this potential, we describe two synonymous SNPs that may have clinical implications in phenylketonuria and in Best's macular dystrophy due to TE differences between alleles.
Project description:BACKGROUND:Synonymous mutations have been identified to play important roles in cancer development, although they do not modify the protein sequences. However, relatively little research has specifically delineated the functionality of synonymous mutations in cancer. RESULTS:We investigated the nucleotide-based and amino acid-based features of synonymous mutations across 15 cancer types from The Cancer Genome Atlas (TCGA), and revealed novel driver candidates by identifying hotspot mutations. Firstly, synonymous mutations were analyzed between TCGA and 1000 Genomes Project at nucleotide and amino acid levels. We found that C:G → T:A transitions were the most frequent single-base substitutions, and leucine underwent the largest number of synonymous mutations in TCGA due to prevalent C → T transition, which induced the transformation between optimal and non-optimal codons. Next, 97 synonymous hotspot mutations in 86 genes were nominated as candidate drivers with potential cancer risk by considering the mutational rates across different sequence contexts. We observed that non-CpG-island GC transition sequence context was positively selected across most of cancer types, and different sequence contexts under which hotspot mutations occur could be significance for genetic differences and functional features. We also found that the hotspots were more conserved than neutral mutations of hotspot-mutation-containing-genes and frequently happened at leucine. In addition, we mapped hotspots, neutral and non-hotspot mutations of hotspot-mutation-containing-genes to their respective protein domains and found ion transport domain was the most frequent one, which could mediate the cell interaction and had relevant implication for tumor therapy. And the signatures of synonymous hotspots were qualitatively similar with those of harmful missense variants. CONCLUSIONS:We illustrated the preferences of cancer associated synonymous mutations, especially hotspots, and laid the groundwork for understanding the synonymous mutations act as drivers in cancer.
Project description:Intrapatient evolution of human immunodeficiency virus type 1 (HIV-1) is driven by the adaptive immune system resulting in rapid change of HIV-1 proteins. When cytotoxic CD8(+) T cells or neutralizing antibodies target a new epitope, the virus often escapes via nonsynonymous mutations that impair recognition. Synonymous mutations do not affect this interplay and are often assumed to be neutral. We test this assumption by tracking synonymous mutations in longitudinal intrapatient data from the C2-V5 part of the env gene. We find that most synonymous variants are lost even though they often reach high frequencies in the viral population, suggesting a cost to the virus. Using published data from SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) assays, we find that synonymous mutations that disrupt base pairs in RNA stems flanking the variable loops of gp120 are more likely to be lost than other synonymous changes: these RNA hairpins might be important for HIV-1. Computational modeling indicates that, to be consistent with the data, a large fraction of synonymous mutations in this genomic region need to be deleterious with a cost on the order of 0.002 per day. This weak selection against synonymous substitutions does not result in a strong pattern of conservation in cross-sectional data but slows down the rate of evolution considerably. Our findings are consistent with the notion that large-scale patterns of RNA structure are functionally relevant, whereas the precise base pairing pattern is not.
Project description:The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G→U and C→U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. While previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.
Project description:The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.
Project description:Synonymous mutations, which change the DNA sequence but not the encoded protein sequence, can affect protein structure and function, mRNA maturation, and mRNA half-lives. The possibility that synonymous mutations might be enriched in cancer has been explored in several recent studies. However, none of these studies control for all three types of mutational heterogeneity (patient, histology, and gene) that are known to affect the accurate identification of non-synonymous cancer-associated genes. Our goal is to adopt the current standard for non-synonymous mutations in an investigation of synonymous mutations. Here, we create an algorithm, MutSigCVsyn, an adaptation of MutSigCV, to identify cancer-associated genes that are enriched for synonymous mutations based on a non-coding background model that takes into account the mutational heterogeneity across these levels. Using MutSigCVsyn, we first analyzed 2572 cancer whole-genome samples from the Pan-cancer Analysis of Whole Genomes (PCAWG) to identify non-synonymous cancer drivers as a quality control. Indicative of the algorithm accuracy we find that 58.6% of these candidate genes were also found in Cancer Census Gene (CGC) list, and 66.2% were found within the PCAWG cancer driver list. We then applied it to identify 30 putative cancer-associated genes that are enriched for synonymous mutations within the same samples. One of the promising gene candidates is the B cell lymphoma 2 (BCL-2) gene. BCL-2 regulates apoptosis by antagonizing the action of proapoptotic BCL-2 family member proteins. The synonymous mutations in BCL2 are enriched in its anti-apoptotic domain and likely play a role in cancer cell proliferation. Our study introduces MutSigCVsyn, an algorithm that accounts for mutational heterogeneity at patient, histology, and gene levels, to identify cancer-associated genes that are enriched for synonymous mutations using whole genome sequencing data. We identified 30 putative candidate genes that will benefit from future experimental studies on the role of synonymous mutations in cancer biology.