Dataset Information

A novel framework for evaluating the performance of codon usage bias metrics.

ABSTRACT: The unequal utilization of synonymous codons affects numerous cellular processes including translation rates, protein folding and mRNA degradation. In order to understand the biological impact of variable codon usage bias (CUB) between genes and genomes, it is crucial to be able to accurately measure CUB for a given sequence. A large number of metrics have been developed for this purpose, but there is currently no way of systematically testing the accuracy of individual metrics or knowing whether metrics provide consistent results. This lack of standardization can result in false-positive and false-negative findings if underpowered or inaccurate metrics are applied as tools for discovery. Here, we show that the choice of CUB metric impacts both the significance and measured effect sizes in numerous empirical datasets, raising questions about the generality of findings in published research. To bring about standardization, we developed a novel method to create synthetic protein-coding DNA sequences according to different models of codon usage. We use these benchmark sequences to identify the most accurate and robust metrics with regard to sequence length, GC content and amino acid heterogeneity. Finally, we show how our benchmark can aid the development of new metrics by providing feedback on its performance compared to the state of the art.

SUBMITTER: Liu SS

PROVIDER: S-EPMC5805967 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A novel framework for evaluating the performance of codon usage bias metrics.

Liu Sophia S SS Hockenberry Adam J AJ Jewett Michael C MC Amaral Luís A N LAN

Journal of the Royal Society, Interface 20180101 138

The unequal utilization of synonymous codons affects numerous cellular processes including translation rates, protein folding and mRNA degradation. In order to understand the biological impact of variable codon usage bias (CUB) between genes and genomes, it is crucial to be able to accurately measure CUB for a given sequence. A large number of metrics have been developed for this purpose, but there is currently no way of systematically testing the accuracy of individual metrics or knowing whethe ...[more]

PMID: 29386398

Similar Datasets

Project description:Codon usage bias (CUB)-preferential use of one of the synonymous codons, has been described in a wide range of organisms from bacteria to mammals, but it has not yet been studied in marine phytoplankton. CUB is thought to be caused by weak selection for translational accuracy and efficiency. Weak selection can overpower genetic drift only in species with large effective population sizes, such as Drosophila that has relatively strong CUB, while organisms with smaller population sizes (e.g., mammals) have weak CUB. Marine plankton species tend to have extremely large populations, suggesting that CUB should be very strong. Here we test this prediction and describe the patterns of codon usage in a wide range of diatom species belonging to 35 genera from 4 classes. We report that most of the diatom species studied have surprisingly modest CUB (mean Effective Number of Codons, ENC = 56), with some exceptions showing stronger codon bias (ENC = 44). Modest codon bias in most studied diatom species may reflect extreme disparity between astronomically large census and modest effective population size (Ne), with fluctuations in population size and linked selection limiting long-term Ne and rendering selection for optimal codons less efficient. For example, genetic diversity (pi ~0.02 at silent sites) in Skeletonema marinoi corresponds to Ne of about 10 million individuals, which is likely many orders of magnitude lower than its census size. Still, Ne ~107 should be large enough to make selection for optimal codons efficient. Thus, we propose that an alternative process-frequent changes of preferred codons, may be a more plausible reason for low CUB despite highly efficient selection for preferred codons in diatom populations. The shifts in the set of optimal codons should result in the changes of the direction of selection for codon usage, so the actual codon usage never catches up with the moving target of the optimal set of codons and the species never develop strong CUB. Indeed, we detected strong shifts in preferential codon usage within some diatom genera, with switches between preferentially GC-rich and AT-rich 3rd codon positions (GC3). For example, GC3 ranges from 0.6 to 1 in most Chaetoceros species, while for Chaetoceros dichaeta GC3 = 0.1. Both variation in selection intensity and mutation spectrum may drive such shifts in codon usage and limit the observed CUB. Our study represents the first genome-wide analysis of CUB in diatoms and the first such analysis for a major phytoplankton group.

Dataset Information

A novel framework for evaluating the performance of codon usage bias metrics.

Publications

A novel framework for evaluating the performance of codon usage bias metrics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets