Unknown

Dataset Information

0

Nonsynonymous Polymorphism Counts in Bacterial Genomes: a Comparative Examination.


ABSTRACT: Genomic data reveal single-nucleotide polymorphisms (SNPs) that may carry information about the evolutionary history of bacteria. However, it remains unclear what inferences about selection can be made from genomic SNP data. Bacterial species are often sampled during epidemic outbreaks or within hosts during the course of chronic infections. SNPs obtained from genomic analysis of these data are not necessarily fixed. Treating them as fixed during analysis by using measures such as the ratio of nonsynonymous to synonymous evolutionary changes (dN/dS) may lead to incorrect inferences about the strength and direction of selection. In this study, we consider data from a range of whole-genome sequencing studies of bacterial pathogens and explore patterns of nonsynonymous variation to assess whether evidence of selection can be identified by investigating SNP counts alone across multiple WGS studies. We visualize these SNP data in ways that highlight their relationship to neutral baseline expectations. These neutral expectations are based on a simple model of mutation, from which we simulate SNP accumulation to investigate how SNP counts are distributed under alternative assumptions about positive and negative selection. We compare these patterns with empirical SNP data and illustrate the general difficulty of detecting positive selection from SNP data. Finally, we consider whether SNP counts observed at the between-host population level differ from those observed at the within-host level and find some evidence that suggests that dynamics across these two scales are driven by different underlying processes.IMPORTANCE Identifying selection from SNP data obtained from whole-genome sequencing studies is challenging. Some current measures used to identify and quantify selection acting on genomes rely on fixed differences; thus, these are inappropriate for SNP data where variants are not fixed. With the increase in whole-genome sequencing studies, it is important to consider SNP data in the context of evolutionary processes. How SNPs are counted and analyzed can help in understanding mutation accumulation and trajectories of strains. We developed a tool for identifying possible evidence of selection and for comparative analysis with other SNP data. We propose a model that provides a rule-of-thumb guideline and two new visualization techniques that can be used to interpret and compare SNP data. We quantify the expected proportion of nonsynonymous SNPs in coding regions under neutrality and demonstrate its use in identifying evidence of positive and negative selection from simulations and empirical data.

SUBMITTER: Loo SL 

PROVIDER: S-EPMC7755258 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC2734133 | biostudies-literature
| S-EPMC3674579 | biostudies-literature
| S-EPMC4755562 | biostudies-literature
| S-EPMC5643016 | biostudies-literature
| PRJNA700858 | ENA
| S-EPMC2262894 | biostudies-literature
| PRJNA388689 | ENA
| S-EPMC2072959 | biostudies-literature
| S-EPMC3618353 | biostudies-other
| PRJEB7314 | ENA