Protein languages differ depending on microorganism lifestyle.
Ontology highlight
ABSTRACT: Few quantitative measures of genome architecture or organization exist to support assumptions of differences between microorganisms that are broadly defined as being free-living or pathogenic. General principles about complete proteomes exist for codon usage, amino acid biases and essential or core genes. Genome-wide shifts in amino acid usage between free-living and pathogenic microorganisms result in fundamental differences in the complexity of their respective proteomes that are size and gene content independent. These differences are evident across broad phylogenetic groups-a result of environmental factors and population genetic forces rather than phylogenetic distance. A novel comparative analysis of amino acid usage-utilizing linguistic analyses of word frequency in language and text-identified a global pattern of higher peptide word repetition in 376 free-living versus 421 pathogen genomes across broad ranges of genome size, G+C content and phylogenetic ancestry. This imprint of repetitive word usage indicates free-living microorganisms have a bias for repetitive sequence usage compared to pathogens. These findings quantify fundamental differences in microbial genomes relative to life-history function.
SUBMITTER: Grzymski JJ
PROVIDER: S-EPMC4020791 | biostudies-literature | 2014
REPOSITORIES: biostudies-literature
ACCESS DATA