Unknown

Dataset Information

0

Predicting Shine-Dalgarno sequence locations exposes genome annotation errors.


ABSTRACT: In prokaryotes, Shine-Dalgarno (SD) sequences, nucleotides upstream from start codons on messenger RNAs (mRNAs) that are complementary to ribosomal RNA (rRNA), facilitate the initiation of protein synthesis. The location of SD sequences relative to start codons and the stability of the hybridization between the mRNA and the rRNA correlate with the rate of synthesis. Thus, accurate characterization of SD sequences enhances our understanding of how an organism's transcriptome relates to its cellular proteome. We implemented the Individual Nearest Neighbor Hydrogen Bond model for oligo-oligo hybridization and created a new metric, relative spacing (RS), to identify both the location and the hybridization potential of SD sequences by simulating the binding between mRNAs and single-stranded 16S rRNA 3' tails. In 18 prokaryote genomes, we identified 2,420 genes out of 58,550 where the strongest binding in the translation initiation region included the start codon, deviating from the expected location for the SD sequence of five to ten bases upstream. We designated these as RS+1 genes. Additional analysis uncovered an unusual bias of the start codon in that the majority of the RS+1 genes used GUG, not AUG. Furthermore, of the 624 RS+1 genes whose SD sequence was associated with a free energy release of less than -8.4 kcal/mol (strong RS+1 genes), 384 were within 12 nucleotides upstream of in-frame initiation codons. The most likely explanation for the unexpected location of the SD sequence for these 384 genes is mis-annotation of the start codon. In this way, the new RS metric provides an improved method for gene sequence annotation. The remaining strong RS+1 genes appear to have their SD sequences in an unexpected location that includes the start codon. Thus, our RS metric provides a new way to explore the role of rRNA-mRNA nucleotide hybridization in translation initiation.

SUBMITTER: Starmer J 

PROVIDER: S-EPMC1463019 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC6107228 | biostudies-literature
| S-EPMC7263185 | biostudies-literature
| S-EPMC5711450 | biostudies-literature
2012-04-01 | GSE35641 | GEO
| S-EPMC3338875 | biostudies-literature
| S-EPMC3045613 | biostudies-literature
2012-04-01 | E-GEOD-35641 | biostudies-arrayexpress
| S-EPMC5303271 | biostudies-literature
| S-EPMC4735710 | biostudies-literature
| S-EPMC6311106 | biostudies-literature