Project description:Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated techniques depend heavily on sequence context and often underestimate the complexity of the proteome. We developed REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION), a de novo algorithm that takes advantage of experimental evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation. Ribo-seq next generation sequencing technique that provides a genome-wide snapshot of the position translating ribosome along an mRNA at the time of the experiment. REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds to screen for spurious ORFs based on a growth curve model. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel ORFs including variants of previously annotated and novel small ORFs (<71 codons). Our predictions were supported by matching mass spectrometry (MS) proteomics data and sequence conservation analysis. REPARATION is unique in that it makes use of experimental Ribo-seq data to perform de novo ORF delineation in bacterial genomes, and thus can identify putative coding ORFs irrespective of the sequence context of the reading frame.
Project description:Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated techniques depend heavily on sequence context and often underestimate the complexity of the proteome. We developed REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION), a de novo algorithm that takes advantage of experimental evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation. Ribo-seq next generation sequencing technique that provides a genome-wide snapshot of the position translating ribosome along an mRNA at the time of the experiment. REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds to screen for spurious ORFs based on a growth curve model. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel ORFs including variants of previously annotated and novel small ORFs (<71 codons). Our predictions were supported by matching mass spectrometry (MS) proteomics data and sequence conservation analysis. REPARATION is unique in that it makes use of experimental Ribo-seq data to perform de novo ORF delineation in bacterial genomes, and thus can identify putative coding ORFs irrespective of the sequence context of the reading frame.
Project description:This study describes the combined sequencing of the genomes and transcriptomes of single blastomeres from mouse 8-cell stage embryos.
Project description:The delineation of genes in bacteria has remained an important challenge because prokaryotic genomes are often tightly packed frequently resulting in overlapping genes. We hereby present a de novo approach called REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION) to delineate translated open reading frames (ORFs) in bacteria independent of (available) genome annotation. By deep sequencing of ribosome protected mRNA fragments (RPF) to map translating ribosomes across the entire genome, REPARATION takes advantage of the recently developed ribosome profiling (Ribo-seq) technique. REPARATION starts by traversing the entire genome to generate all possible ORFs and then collects their corresponding RPF signal information. Based on a growth curve model to estimate minimum ORF read density and Ribo-seq RPF coverage, thresholds indicative of translation is estimated. Finally, our algorithm applies a random forest model to build a classifier to classify putative protein coding ORFs. We evaluated the performance of REPARATION on 3 annotated bacterial species using in-house generated Ribo-seq data and matching N-terminal and shotgun proteomics data next to publically available Ribo-seq data. In all cases, about 80% of the ORFs predicted by REPARATION were previously annotated as protein coding. While 13-20% were variants of previously annotated ORFs and about 3-4% point to novel translated ORFs within intergenic or other regions previously annotated as non-coding. Without stringent length restrictions REPARATION was able to identify several small ORFs (sORFs). Multiple supportive evidence from matching MS data and sequence conservation analysis was obtained to validate predicted ORFs.
Project description:Whole-genome sequencing on PacBio of laboratory mouse strains. See http://www.sanger.ac.uk/resources/mouse/genomes/ for more details. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Project description:Identifying smORFs and SEPs is technically and computationally challenging. Experimentally, techniques as ribosome profiling (Ribo-Seq and mass spectroscopy (MS) are used. Ribo-Seq sequences the mRNA and does not provide the translated frame, thus identifying proteins encoded by overlapping ORFs is not feasible. Herein we have used MS to characterize smORFomes of different Mycoplasma species. This data is used to corroborate the predictions of a random forest classifier that in silico predicts all the putative SEPs encoded by different bacterial genomes.