ABSTRACT: We used Ribo-seq (Ribosome profiling) combining with RNA-seq to explore the translational landscape of tomato roots. We generated three biological replicates of RNA-seq and Ribo-seq data for tomato roots. We next used the RNA-seq result for de novo transcriptome assembly and Ribo-seq to identify novel translated open reading frames (ORFs). Our data revealed more than three hundreds of novel translated ORFs on previously unannotated transcripts. Most of the newly identified ORFs are small and difficult to detect with in silico methods. We also identified over thirteen hundreds of upstream ORFs on annotated genes. This data could facilitate gene annotation. Besides, this data also demonstrated that uORFs, miRNAs and antisense RNAs are regulating the expression of associated genes. This study uncovered mechanisms of translational regulation and gene annotation in tomato.
Project description:Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated techniques depend heavily on sequence context and often underestimate the complexity of the proteome. We developed REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION), a de novo algorithm that takes advantage of experimental evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation. Ribo-seq next generation sequencing technique that provides a genome-wide snapshot of the position translating ribosome along an mRNA at the time of the experiment. REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds to screen for spurious ORFs based on a growth curve model. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel ORFs including variants of previously annotated and novel small ORFs (<71 codons). Our predictions were supported by matching mass spectrometry (MS) proteomics data and sequence conservation analysis. REPARATION is unique in that it makes use of experimental Ribo-seq data to perform de novo ORF delineation in bacterial genomes, and thus can identify putative coding ORFs irrespective of the sequence context of the reading frame.
Project description:Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated techniques depend heavily on sequence context and often underestimate the complexity of the proteome. We developed REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION), a de novo algorithm that takes advantage of experimental evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation. Ribo-seq next generation sequencing technique that provides a genome-wide snapshot of the position translating ribosome along an mRNA at the time of the experiment. REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds to screen for spurious ORFs based on a growth curve model. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel ORFs including variants of previously annotated and novel small ORFs (<71 codons). Our predictions were supported by matching mass spectrometry (MS) proteomics data and sequence conservation analysis. REPARATION is unique in that it makes use of experimental Ribo-seq data to perform de novo ORF delineation in bacterial genomes, and thus can identify putative coding ORFs irrespective of the sequence context of the reading frame.
Project description:The delineation of genes in bacteria has remained an important challenge because prokaryotic genomes are often tightly packed frequently resulting in overlapping genes. We hereby present a de novo approach called REPARATION (RibosomeE Profiling Assisted (Re-)AnnotaTION) to delineate translated open reading frames (ORFs) in bacteria independent of (available) genome annotation. By deep sequencing of ribosome protected mRNA fragments (RPF) to map translating ribosomes across the entire genome, REPARATION takes advantage of the recently developed ribosome profiling (Ribo-seq) technique. REPARATION starts by traversing the entire genome to generate all possible ORFs and then collects their corresponding RPF signal information. Based on a growth curve model to estimate minimum ORF read density and Ribo-seq RPF coverage, thresholds indicative of translation is estimated. Finally, our algorithm applies a random forest model to build a classifier to classify putative protein coding ORFs. We evaluated the performance of REPARATION on 3 annotated bacterial species using in-house generated Ribo-seq data and matching N-terminal and shotgun proteomics data next to publically available Ribo-seq data. In all cases, about 80% of the ORFs predicted by REPARATION were previously annotated as protein coding. While 13-20% were variants of previously annotated ORFs and about 3-4% point to novel translated ORFs within intergenic or other regions previously annotated as non-coding. Without stringent length restrictions REPARATION was able to identify several small ORFs (sORFs). Multiple supportive evidence from matching MS data and sequence conservation analysis was obtained to validate predicted ORFs.
Project description:We used Ribo-seq (Ribosome profiling) combining with RNA-seq to explore the translational landscape of Arabidopsis Col-0 seedling. We generated 6 biological replicates of RNA-seq and Ribo-seq data for Arabidopsis Col-0 seedling. 3 of the replicates were collected after 20 minutes of 0.1% DMSO treatment and the other 3 samples were collected after 60 minutes of DMSO treatmeant. The resulting RNA-seq and Ribo-seq files were used to discover translated up-stream ORFs (uORFs) and analyze the translation efficiency of uORF-containing genes in Arabidopsis.
Project description:Identifying smORFs and SEPs is technically and computationally challenging. Experimentally, techniques as ribosome profiling (Ribo-Seq and mass spectroscopy (MS) are used. Ribo-Seq sequences the mRNA and does not provide the translated frame, thus identifying proteins encoded by overlapping ORFs is not feasible. Herein we have used MS to characterize smORFomes of different Mycoplasma species. This data is used to corroborate the predictions of a random forest classifier that in silico predicts all the putative SEPs encoded by different bacterial genomes.
Project description:Identifying smORFs and SEPs is technically and computationally challenging. Experimentally, techniques as ribosome profiling (Ribo-Seq and mass spectroscopy (MS) are used. Ribo-Seq sequences the mRNA and does not provide the translated frame, thus identifying proteins encoded by overlapping ORFs is not feasible. Herein we have used MS to characterize smORFomes of different Mycoplasma species and Escherichia coli. This data is used to corroborate the predictions of a random forest classifier that in silico predicts all the putative SEPs encoded by different bacterial genomes.
Project description:Identifying smORFs and SEPs is technically and computationally challenging. Experimentally, techniques as ribosome profiling (Ribo-Seq and mass spectroscopy (MS) are used. Ribo-Seq sequences the mRNA and does not provide the translated frame, thus identifying proteins encoded by overlapping ORFs is not feasible. Herein we have used MS to characterize smORFomes of different Mycoplasma species, Escherichia coli, Staphylococcus aureus and Pseudomonas aeruginosa. This data is used to corroborate the predictions of a random forest classifier that in silico predicts all the putative SEPs encoded by different bacterial genomes.
Project description:Ribosome profiling revealed translation outside of canonical coding sequences (CDSs) including translation of short upstream open-reading frames (ORFs), long non-coding RNAs, ORFs in UTRs or ORFs in alternative reading frames. Ribo-seq but also bioinformatics-based prediction and RNA-sequencing reported translation of thousands of ORFs derived from non-translated regions (NTRs). Although such ORFs gained increased attention over the years, their actual coding potential remains debated as protein products of only a fraction of them were identified by mass spectrometry. Here, we introduced a new workflow to discover translation products of NTRs at a large-scale. We combined reducing sample complexity (by enriching N-terminal peptides of cytosolic proteins as such peptides are ideal proxies for translation) with and extend search space (combining the sequences of UniProt proteins, UniProt isoforms and publicly available Ribo-seq data) reasoning that this combination increased chances of identifying proteins from NTRs. Further, we introduced rigorous data analysis and results curation workflows to cope with the increased complexity of the search space and to mine identified peptides. This stringent filtering approach was found essential to retain confident translational evidence at the peptide level for NTRs. We show that theoretically our strategy facilitates the detection of translation events of transcripts from NTRs, but experimentally less than 1% of all identified peptides might originate from such translation events.
Project description:Ribosome profiling has revealed translation outside of canonical coding sequences (CDSs) including translation of short upstream ORFs, long non-coding RNAs, overlapping ORFs, ORFs in UTRs or ORFs in alternative reading frames. Studies combining mass spectrometry, ribosome profiling and CRISPR-based screens showed that hundreds of ORFs derived from non-coding transcripts produce (micro)proteins and might be functional, while other studies failed to find evidence for such types of non-canonical translation events. In this study we tried to detect (and characterize) proteins originating from these non-translated regions (NTRs). We attempted to discover translation products from non-coding regions at large scale by reducing the overall sample complexity (by enriching cytosolic N-terminal peptides) and combined it with an extend search space (combining UniProt proteins, UniProt isoforms and publicly available Ribo-seq data). Reasoning that this strategy would increase the likelihood of identifying proteins from NTRs. Further, we introduced rigorous data analysis and results curation workflows. This stringent filtering approach was found essential to retain confident translational evidence at the peptide level for NTRs. We show that, theoretically, our strategy facilitates the detection of translation events of transcripts from NTRs, but experimentally we find that less than 1% of all identified peptides might originate from such translation events. However, one NTR protein was further characterized by Virotrap based interaction analysis. This resulted in several potential interaction partners associated with membranes and vesicle transport. Showing that the non-translated regions that do result in proteins might be functional.
Project description:Immunopeptides that are translated in cells and presented at the cell surface by major histocompatibility complex (MHC) mole-cules are important epitopes in basic Immunology and translational cancer immunotherapy. While most of the reported immuno-peptides are derived from protein coding sequence, the non-canonical peptides translated from “non-coding” regions are emerging and have attracted much attention in recent years. However, sensitive and accurate identification of such peptides remains a challenging task. Here we report an optimized approach integrating Ribo-seq and mass spectrometry to identify hundreds of non-canonical MHC-binding peptides. Three pipelines for analyzing Ribo-seq data were compared to generate small open reading frame (sORF) databases. Meanwhile, we have also combined bottom-up and de novo searching in proteomics data analysis and identified more immunopeptides. 7902 canonical and 308 non-canonical immunopeptides have been identified with selected ones vigorously validated. The present study provides a handy solution for identifying non-canonical MHC epitopes. The novel im-munopeptides resolving mechanisms of cancer antigen presentation, as well as applications in cancer immunotherapies.