Unknown

Dataset Information

0

Phylogenomics of prokaryotic ribosomal proteins.


ABSTRACT: Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 archaeal genomes available in the RefSeq database. The pipeline employs curated seed alignments of r-proteins to run position-specific scoring matrix (PSSM)-based BLAST searches against six-frame genome translations, mitigating possible gene annotation errors. As a result of this analysis, we performed a census of prokaryotic r-protein complements, enumerated missing and paralogous r-proteins, and analyzed the distributions of ribosomal protein genes among chromosomal partitions. Phyletic patterns of bacterial and archaeal r-protein genes were mapped to phylogenetic trees reconstructed from concatenated alignments of r-proteins to reveal the history of likely multiple independent gains and losses. These alignments, available for download, can be used as search profiles to improve genome annotation of r-proteins and for further comparative genomics studies.

SUBMITTER: Yutin N 

PROVIDER: S-EPMC3353972 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

Phylogenomics of prokaryotic ribosomal proteins.

Yutin Natalya N   Puigbò Pere P   Koonin Eugene V EV   Wolf Yuri I YI  

PloS one 20120516 5


Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 arch  ...[more]

Similar Datasets

| S-EPMC6942260 | biostudies-literature
| S-EPMC6071974 | biostudies-literature
| S-EPMC6009695 | biostudies-literature
| S-EPMC2636767 | biostudies-literature
| S-EPMC6006769 | biostudies-literature
| S-EPMC1524752 | biostudies-literature
| S-EPMC5529175 | biostudies-literature
| S-EPMC6299218 | biostudies-literature
| S-EPMC4592705 | biostudies-literature
| EGAS00001002405 | EGA