Unknown

Dataset Information

0

Protein languages differ depending on microorganism lifestyle.


ABSTRACT: Few quantitative measures of genome architecture or organization exist to support assumptions of differences between microorganisms that are broadly defined as being free-living or pathogenic. General principles about complete proteomes exist for codon usage, amino acid biases and essential or core genes. Genome-wide shifts in amino acid usage between free-living and pathogenic microorganisms result in fundamental differences in the complexity of their respective proteomes that are size and gene content independent. These differences are evident across broad phylogenetic groups-a result of environmental factors and population genetic forces rather than phylogenetic distance. A novel comparative analysis of amino acid usage-utilizing linguistic analyses of word frequency in language and text-identified a global pattern of higher peptide word repetition in 376 free-living versus 421 pathogen genomes across broad ranges of genome size, G+C content and phylogenetic ancestry. This imprint of repetitive word usage indicates free-living microorganisms have a bias for repetitive sequence usage compared to pathogens. These findings quantify fundamental differences in microbial genomes relative to life-history function.

SUBMITTER: Grzymski JJ 

PROVIDER: S-EPMC4020791 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

Protein languages differ depending on microorganism lifestyle.

Grzymski Joseph J JJ   Marsh Adam G AG  

PloS one 20140514 5


Few quantitative measures of genome architecture or organization exist to support assumptions of differences between microorganisms that are broadly defined as being free-living or pathogenic. General principles about complete proteomes exist for codon usage, amino acid biases and essential or core genes. Genome-wide shifts in amino acid usage between free-living and pathogenic microorganisms result in fundamental differences in the complexity of their respective proteomes that are size and gene  ...[more]

Similar Datasets

| S-EPMC6754359 | biostudies-literature
| S-EPMC7368822 | biostudies-literature
| S-EPMC8106315 | biostudies-literature
| S-EPMC5042114 | biostudies-literature
| S-EPMC3132300 | biostudies-literature
| S-EPMC6996405 | biostudies-literature
| S-EPMC4065413 | biostudies-literature
| S-EPMC6454624 | biostudies-literature
| S-EPMC5690848 | biostudies-literature
| S-EPMC6862301 | biostudies-literature