Dataset Information

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes.

ABSTRACT:

Background

Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown proteins but also for the prediction of operons, promoters, and small non-coding RNA genes, as this typically makes use of the intergenic distance. A further problem is that most existing methods are optimized for Escherichia coli data sets; applying these methods to newly sequenced bacterial genomes may not result in an equivalent level of accuracy.

Results

Based on a biological representation of the translation process, we applied Bayesian statistics to create a score function for predicting translation initiation sites. In contrast to existing programs, our combination of methods uses supervised learning to optimally use the set of known translation initiation sites. We combined the Ribosome Binding Site (RBS) sequence, the distance between the translation initiation site and the RBS sequence, the base composition of the start codon, the nucleotide composition (A-rich sequences) following start codons, and the expected distribution of the protein length in a Bayesian scoring function. To further increase the prediction accuracy, we also took into account the operon orientation. The outcome of the procedure achieved a prediction accuracy of 93.2% in 858 E. coli genes from the EcoGene data set and 92.7% accuracy in a data set of 1243 Bacillus subtilis 'non-y' genes. We confirmed the performance in the GC-rich Gamma-Proteobacteria Herminiimonas arsenicoxydans, Pseudomonas aeruginosa, and Burkholderia pseudomallei K96243.

Conclusion

Hon-yaku, being based on a careful choice of elements important in translation, improved the prediction accuracy in B. subtilis data sets and other bacteria except for E. coli. We believe that most remaining mispredictions are due to atypical ribosomal binding sequences used in specific translation control processes, or likely errors in the training data sets.

SUBMITTER: Makita Y

PROVIDER: S-EPMC1805508 | biostudies-literature | 2007 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes.

Makita Yuko Y de Hoon Michiel J L MJ Danchin Antoine A

BMC bioinformatics 20070208

<h4>Background</h4>Computational prediction methods are currently used to identify genes in prokaryote genomes. However, identification of the correct translation initiation sites remains a difficult task. Accurate translation initiation sites (TISs) are important not only for the annotation of unknown proteins but also for the prediction of operons, promoters, and small non-coding RNA genes, as this typically makes use of the intergenic distance. A further problem is that most existing methods ...[more]

PMID: 17286872

Dataset Information

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes.

Background

Results

Conclusion

Publications

Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Dynamic evolution of translation initiation mechanisms in prokaryotes.
| S-EPMC2851962 | biostudies-literature

Comparison of computational methods for identifying translation initiation sites in EST data.
| S-EPMC375524 | biostudies-literature

TITER: predicting translation initiation sites by deep learning.
| S-EPMC5870772 | biostudies-literature

Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes.
| S-EPMC3160421 | biostudies-literature

Unraveling the plasticity of translation initiation in prokaryotes: Beyond the invariant Shine-Dalgarno sequence.
| S-EPMC10783764 | biostudies-literature

Translation from unconventional 5' start sites drives tumour initiation.
| S-EPMC5287289 | biostudies-literature

Quantitative analysis of mammalian translation initiation sites by FACS-seq.
| S-EPMC4299517 | biostudies-literature

Tandem repeats ubiquitously flank and contribute to translation initiation sites.
| S-EPMC9331589 | biostudies-literature

TRII: A Probabilistic Scoring of Drosophila melanogaster Translation Initiation Sites.
| S-EPMC3171364 | biostudies-literature