Unknown

Dataset Information

0

Genomic fluidity: an integrative view of gene diversity within microbial populations.


ABSTRACT:

Background

The dual concepts of pan and core genomes have been widely adopted as means to assess the distribution of gene families within microbial species and genera. The core genome is the set of genes shared by a group of organisms; the pan genome is the set of all genes seen in any of these organisms. A variety of methods have provided drastically different estimates of the sizes of pan and core genomes from sequenced representatives of the same groups of bacteria.

Results

We use a combination of mathematical, statistical and computational methods to show that current predictions of pan and core genome sizes may have no correspondence to true values. Pan and core genome size estimates are problematic because they depend on the estimation of the occurrence of rare genes and genomes, respectively, which are difficult to estimate precisely because they are rare. Instead, we introduce and evaluate a robust metric - genomic fluidity - to categorize the gene-level similarity among groups of sequenced isolates. Genomic fluidity is a measure of the dissimilarity of genomes evaluated at the gene level.

Conclusions

The genomic fluidity of a population can be estimated accurately given a small number of sequenced genomes. Further, the genomic fluidity of groups of organisms can be compared robustly despite variation in algorithms used to identify genes and their homologs. As such, we recommend that genomic fluidity be used in place of pan and core genome size estimates when assessing gene diversity within genomes of a species or a group of closely related organisms.

SUBMITTER: Kislyuk AO 

PROVIDER: S-EPMC3030549 | biostudies-literature | 2011 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Genomic fluidity: an integrative view of gene diversity within microbial populations.

Kislyuk Andrey O AO   Haegeman Bart B   Bergman Nicholas H NH   Weitz Joshua S JS  

BMC genomics 20110113


<h4>Background</h4>The dual concepts of pan and core genomes have been widely adopted as means to assess the distribution of gene families within microbial species and genera. The core genome is the set of genes shared by a group of organisms; the pan genome is the set of all genes seen in any of these organisms. A variety of methods have provided drastically different estimates of the sizes of pan and core genomes from sequenced representatives of the same groups of bacteria.<h4>Results</h4>We  ...[more]

Similar Datasets

| S-EPMC1888801 | biostudies-other
| S-EPMC5243131 | biostudies-literature
| S-EPMC5442188 | biostudies-literature
| S-EPMC5742349 | biostudies-literature
| S-EPMC98949 | biostudies-literature
| S-EPMC2515634 | biostudies-literature
| S-EPMC5345834 | biostudies-literature
| S-EPMC6385308 | biostudies-other
| S-EPMC10614936 | biostudies-literature
| S-EPMC1828824 | biostudies-literature