Dataset Information

The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC.

ABSTRACT: Positive selection is widely estimated from protein coding sequence alignments by the nonsynonymous-to-synonymous ratio ?. Increasingly elaborate codon models are used in a likelihood framework for this estimation. Although there is widespread concern about the robustness of the estimation of the ? ratio, more efforts are needed to estimate this robustness, especially in the context of complex models. Here, we focused on the branch-site codon model. We investigated its robustness on a large set of simulated data. First, we investigated the impact of sequence divergence. We found evidence of underestimation of the synonymous substitution rate for values as small as 0.5, with a slight increase in false positives for the branch-site test. When dS increases further, underestimation of dS is worse, but false positives decrease. Interestingly, the detection of true positives follows a similar distribution, with a maximum for intermediary values of dS. Thus, high dS is more of a concern for a loss of power (false negatives) than for false positives of the test. Second, we investigated the impact of GC content. We showed that there is no significant difference of false positives between high GC (up to ?80%) and low GC (?30%) genes. Moreover, neither shifts of GC content on a specific branch nor major shifts in GC along the gene sequence generate many false positives. Our results confirm that the branch-site is a very conservative test.

SUBMITTER: Gharib WH

PROVIDER: S-EPMC3684852 | biostudies-other | 2013 Jul

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC.

Gharib Walid H WH Robinson-Rechavi Marc M

Molecular biology and evolution 20130404 7

Positive selection is widely estimated from protein coding sequence alignments by the nonsynonymous-to-synonymous ratio ω. Increasingly elaborate codon models are used in a likelihood framework for this estimation. Although there is widespread concern about the robustness of the estimation of the ω ratio, more efforts are needed to estimate this robustness, especially in the context of complex models. Here, we focused on the branch-site codon model. We investigated its robustness on a large set ...[more]

PMID: 23558341

Similar Datasets

Project description:BACKGROUND: Synonymous DNA substitution rates in the plant chloroplast genome are generally relatively slow and lineage dependent. Non-synonymous rates are usually even slower due to purifying selection acting on the genes. Positive selection is expected to speed up non-synonymous substitution rates, whereas synonymous rates are expected to be unaffected. Until recently, positive selection has seldom been observed in chloroplast genes, and large-scale structural rearrangements leading to gene duplications are hitherto supposed to be rare. METHODOLOGY/PRINCIPLE FINDINGS: We found high substitution rates in the exons of the plastid clpP1 gene in Oenothera (the Evening Primrose family) and three separate lineages in the tribe Sileneae (Caryophyllaceae, the Carnation family). Introns have been lost in some of the lineages, but where present, the intron sequences have substitution rates similar to those found in other introns of their genomes. The elevated substitution rates of clpP1 are associated with statistically significant whole-gene positive selection in three branches of the phylogeny. In two of the lineages we found multiple copies of the gene. Neighboring genes present in the duplicated fragments do not show signs of elevated substitution rates or positive selection. Although non-synonymous substitutions account for most of the increase in substitution rates, synonymous rates are also markedly elevated in some lineages. Whereas plant clpP1 genes experiencing negative (purifying) selection are characterized by having very conserved lengths, genes under positive selection often have large insertions of more or less repetitive amino acid sequence motifs. CONCLUSIONS/SIGNIFICANCE: We found positive selection of the clpP1 gene in various plant lineages to correlated with repeated duplication of the clpP1 gene and surrounding regions, repetitive amino acid sequences, and increase in synonymous substitution rates. The present study sheds light on the controversial issue of whether negative or positive selection is to be expected after gene duplications by providing evidence for the latter alternative. The observed increase in synonymous substitution rates in some of the lineages indicates that the detection of positive selection may be obscured under such circumstances. Future studies are required to explore the functional significance of the large inserted repeated amino acid motifs, as well as the possibility that synonymous substitution rates may be affected by positive selection.

Dataset Information

The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC.

Publications

The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets