Dataset Information

Phylogenomics of prokaryotic ribosomal proteins.

ABSTRACT: Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 archaeal genomes available in the RefSeq database. The pipeline employs curated seed alignments of r-proteins to run position-specific scoring matrix (PSSM)-based BLAST searches against six-frame genome translations, mitigating possible gene annotation errors. As a result of this analysis, we performed a census of prokaryotic r-protein complements, enumerated missing and paralogous r-proteins, and analyzed the distributions of ribosomal protein genes among chromosomal partitions. Phyletic patterns of bacterial and archaeal r-protein genes were mapped to phylogenetic trees reconstructed from concatenated alignments of r-proteins to reveal the history of likely multiple independent gains and losses. These alignments, available for download, can be used as search profiles to improve genome annotation of r-proteins and for further comparative genomics studies.

SUBMITTER: Yutin N

PROVIDER: S-EPMC3353972 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Phylogenomics of prokaryotic ribosomal proteins.

Yutin Natalya N Puigbò Pere P Koonin Eugene V EV Wolf Yuri I YI

PloS one 20120516 5

Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 arch ...[more]

PMID: 22615861

Similar Datasets

Project description:BackgroundThe SPFH protein superfamily is a diverse family of proteins whose eukaryotic members are involved in the scaffolding of detergent-resistant microdomains. Recently the origin of the SPFH proteins has been questioned. Instead, convergent evolution has been proposed. However, an independent, convergent evolution of three large prokaryotic and three eukaryotic families is highly unlikely, especially when other mechanisms such as lateral gene transfer which could also explain their distribution pattern have not yet been considered.To gain better insight into this very diverse protein family, we have analyzed the genomes of 497 microorganisms and investigated the pattern of occurrence as well as the genomic vicinity of the prokaryotic SPFH members.ResultsAccording to sequence and operon structure, a clear division into 12 subfamilies was evident. Three subfamilies (SPFH1, SPFH2 and SPFH5) show a conserved operon structure and two additional subfamilies are linked to those three through functional aspects (SPFH1, SPFH3, SPFH4: interaction with FtsH protease). Therefore these subgroups most likely share common ancestry. The complex pattern of occurrence among the different phyla is indicative of lateral gene transfer. Organisms that do not possess a single SPFH protein are almost exclusively endosymbionts or endoparasites.ConclusionThe conserved operon structure and functional similarities suggest that at least 5 subfamilies that encompass almost 75% of all prokaryotic SPFH members share a common origin. Their similarity to the different eukaryotic SPFH families, as well as functional similarities, suggests that the eukaryotic SPFH families originated from different prokaryotic SPFH families rather than one. This explains the difficulties in obtaining a consistent phylogenetic tree of the eukaryotic SPFH members. Phylogenetic evidence points towards lateral gene transfer as one source of the very diverse patterns of occurrence in bacterial species.

Project description:BackgroundThe structural and functional features associated with Simple Sequence Proteins (SSPs) are non-globularity, disease states, signaling and post-translational modification. SSPs are also an important source of genetic and possibly phenotypic variation. Analysis of 249 prokaryotic proteomes offers a new opportunity to examine the genomic properties of SSPs.ResultsSSPs are a minority but they grow with proteome size. This relationship is exhibited across species varying in genomic GC, mutational bias, life style, and pathogenicity. Their proportion in each proteome is strongly influenced by genomic base compositional bias. In most species simple duplications is favoured, but in a few cases such as Mycobacteria, large families of duplications occur. Amino acid preference in SSPs exhibits a trend towards low cost of biosynthesis. In SSPs and in non-SSPs, Alanine, Glycine, Leucine, and Valine are abundant in species widely varying in genomic GC whereas Isoleucine and Lysine are rich only in organisms with low genomic GC. Arginine is abundant in SSPs of two species and in the non-SSPs of Xanthomonas oryzae. Asparagine is abundant only in SSPs of low GC species. Aspartic acid is abundant only in the non-SSPs of Halobacterium sp NRC1. The abundance of Serine in SSPs of 62 species extends over a broader range compared to that of non-SSPs. Threonine(T) is abundant only in SSPs of a couple of species. SSPs exhibit preferential association with Cell surface, Cell membrane and Transport functions and a negative association with Metabolism. Mesophiles and Thermophiles display similar ranges in the content of SSPs.ConclusionAlthough SSPs are a minority, the genomic forces of base compositional bias and duplications influence their growth and pattern in each species. The preferences and abundance of amino acids are governed by low biosynthetic cost, evolutionary age and base composition of codons. Abundance of charged amino acids Arginine and Aspartic acid is severely restricted. SSPs preferentially associate with cell surface and interface functions as opposed to metabolism, wherein proteins of high sequence complexity with globular structures are preferred. Mesophiles and Thermophiles are similar with respect to the content of SSPs. Our analysis serves to expand the commonly held views on SSPs.

Project description:BackgroundSome mobile genetic elements target the lagging strand template during DNA replication. Bacterial examples are insertion sequences IS608 and ISDra2 (IS200/IS605 family members). They use obligatory single-stranded circular DNA intermediates for excision and insertion and encode a transposase, TnpAIS200, which recognizes subterminal secondary structures at the insertion sequence ends. Similar secondary structures, Repeated Extragenic Palindromes (REP), are present in many bacterial genomes. TnpAIS200-related proteins, TnpAREP, have been identified and could be responsible for REP sequence proliferation. These proteins share a conserved HuH/Tyrosine core domain responsible for catalysis and are involved in processes of ssDNA cleavage and ligation. Our goal is to characterize the diversity of these proteins collectively referred as the TnpAY1 family.ResultsA genome-wide analysis of sequences similar to TnpAIS200 and TnpAREP in prokaryotes revealed a large number of family members with a wide taxonomic distribution. These can be arranged into three distinct classes and 12 subclasses based on sequence similarity. One subclass includes sequences similar to TnpAIS200. Proteins from other subclasses are not associated with typical insertion sequence features. These are characterized by specific additional domains possibly involved in protein/DNA or protein/protein interactions. Their genes are found in more than 25% of species analyzed. They exhibit a patchy taxonomic distribution consistent with dissemination by horizontal gene transfers followed by loss. The tnpAREP genes of five subclasses are flanked by typical REP sequences in a REPtron-like arrangement. Four distinct REP types were characterized with a subclass specific distribution. Other subclasses are not associated with REP sequences but have a large conserved domain located in C-terminal end of their sequence. This unexpected diversity suggests that, while most likely involved in processing single-strand DNA, proteins from different subfamilies may play a number of different roles.ConclusionsWe established a detailed classification of TnpAY1 proteins, consolidated by the analysis of the conserved core domains and the characterization of additional domains. The data obtained illustrate the unexpected diversity of the TnpAY1 family and provide a strong framework for future evolutionary and functional studies. By their potential function in ssDNA editing, they may confer adaptive responses to host cell physiology and metabolism.

Project description:Members of the ancient family of Argonaute (Ago) proteins are present in all domains of life. The common feature of Ago proteins is the ability to bind small nucleic acid guides and use them for sequence-specific recognition-and sometimes cleavage-of complementary targets. While eukaryotic Ago (eAgo) proteins are key players in RNA interference and related pathways, the properties and functions of these proteins in archaeal and bacterial species have just started to emerge. We undertook comprehensive exploration of prokaryotic Ago (pAgo) proteins in sequenced genomes and revealed their striking diversity in comparison with eAgos. Many pAgos contain divergent variants of the conserved domains involved in interactions with nucleic acids, while having extra domains that are absent in eAgos, suggesting that they might have unusual specificities in the nucleic acid recognition and cleavage. Many pAgos are associated with putative nucleases, helicases, and DNA binding proteins in the same gene or operon, suggesting that they are involved in target processing. The great variability of pAgos revealed by our analysis opens new ways for exploration of their functions in host cells and for their use as potential tools in genome editing.IMPORTANCE The eukaryotic Ago proteins and the RNA interference pathways they are involved in are widely used as a powerful tool in research and as potential therapeutics. In contrast, the properties and functions of prokaryotic Ago (pAgo) proteins have remained poorly understood. Understanding the diversity and functions of pAgos holds a huge potential for discovery of new cellular pathways and novel tools for genome manipulations. Only few pAgos have been characterized by structural or biochemical approaches, while previous genomic studies discovered about 300 proteins in archaeal and eubacterial genomes. Since that time the number of bacterial strains with sequenced genomes has greatly expanded, and many previously sequenced genomes have been revised. We undertook comprehensive analysis of pAgo proteins in sequenced genomes and almost tripled the number of known genes of this family. Our research thus forms a foundation for further experimental characterization of pAgo functions that will be important for understanding of the basic biology of these proteins and their adoption as a potential tool for genome engineering in the future.

Project description:BackgroundOne of the stranger phenomena that can occur during gene translation is where, as a ribosome reads along the mRNA, various cellular and molecular properties contribute to stalling the ribosome on a slippery sequence, shifting the ribosome into one of the other two alternate reading frames. The alternate frame has different codons, so different amino acids are added to the peptide chain, but more importantly, the original stop codon is no longer in-frame, so the ribosome can bypass the stop codon and continue to translate the codons past it. This produces a longer version of the protein, a fusion of the original in-frame amino acids, followed by all the alternate frame amino acids. There is currently no automated software to predict the occurrence of these programmed ribosomal frameshifts (PRF), and they are currently only identified by manual curation.ResultsHere we present PRFect, an innovative machine-learning method for the detection and prediction of PRFs in coding genes of various types. PRFect combines advanced machine learning techniques with the integration of multiple complex cellular properties, such as secondary structure, codon usage, ribosomal binding site interference, direction, and slippery site motif. Calculating and incorporating these diverse properties posed significant challenges, but through extensive research and development, we have achieved a user-friendly approach. The code for PRFect is freely available, open-source, and can be easily installed via a single command in the terminal. Our comprehensive evaluations on diverse organisms, including bacteria, archaea, and phages, demonstrate PRFect's strong performance, achieving high sensitivity, specificity, and an accuracy exceeding 90%.ConclusionPRFect represents a significant advancement in the field of PRF detection and prediction, offering a powerful tool for researchers and scientists to unravel the intricacies of programmed ribosomal frameshifting in coding genes.

Dataset Information

Phylogenomics of prokaryotic ribosomal proteins.

Publications

Phylogenomics of prokaryotic ribosomal proteins.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets