Unknown

Dataset Information

0

Kullback Leibler divergence in complete bacterial and phage genomes.


ABSTRACT: The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback-Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.

SUBMITTER: Akhter S 

PROVIDER: S-EPMC5712468 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

altmetric image

Publications

Kullback Leibler divergence in complete bacterial and phage genomes.

Akhter Sajia S   Aziz Ramy K RK   Kashef Mona T MT   Ibrahim Eslam S ES   Bailey Barbara B   Edwards Robert A RA  

PeerJ 20171130


The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback-Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of ba  ...[more]

Similar Datasets

| S-EPMC3538811 | biostudies-literature
| S-EPMC4613467 | biostudies-literature
| S-EPMC6461446 | biostudies-literature
| S-EPMC6988219 | biostudies-literature
| S-EPMC8872895 | biostudies-literature
| S-EPMC5066877 | biostudies-literature
| S-EPMC7567735 | biostudies-literature
| S-EPMC6906390 | biostudies-literature
| S-EPMC10557483 | biostudies-literature
| S-EPMC4498404 | biostudies-literature