REPARATION: Ribosome Profiling Assisted (Re-) Annotation of Bacterial genomes
Ontology highlight
ABSTRACT: The delineation of genes in bacteria has remained an important challenge because prokaryotic genomes are often tightly packed frequently resulting in overlapping genes. We hereby present a de novo approach called REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION) to delineate translated open reading frames (ORFs) in bacteria independent of (available) genome annotation. By deep sequencing of ribosome protected mRNA fragments (RPF) to map translating ribosomes across the entire genome, REPARATION takes advantage of the recently developed ribosome profiling (Ribo-seq) technique. REPARATION starts by traversing the entire genome to generate all possible ORFs and then collects their corresponding RPF signal information. Based on a growth curve model to estimate minimum ORF read density and Ribo-seq RPF coverage, thresholds indicative of translation is estimated. Finally, our algorithm applies a random forest model to build a classifier to classify putative protein coding ORFs. We evaluated the performance of REPARATION on 3 annotated bacterial species using in-house generated Ribo-seq data and matching N-terminal and shotgun proteomics data next to publically available Ribo-seq data. In all cases, about 80% of the ORFs predicted by REPARATION were previously annotated as protein coding. While 13-20% were variants of previously annotated ORFs and about 3-4% point to novel translated ORFs within intergenic or other regions previously annotated as non-coding. Without stringent length restrictions REPARATION was able to identify several small ORFs (sORFs). Multiple supportive evidence from matching MS data and sequence conservation analysis was obtained to validate predicted ORFs.
ORGANISM(S): Salmonella enterica subsp. enterica serovar Typhimurium str. SL1344
PROVIDER: GSE91066 | GEO | 2017/01/30
SECONDARY ACCESSION(S): PRJNA356772
REPOSITORIES: GEO
ACCESS DATA