Project description:BackgroundThe quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data.ResultsWe developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: (i) full-length RNA-seq for detection of splicing patterns and (ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts. We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings and Saccharomyces cerevisiae cells as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the most commonly used community gene models, TAIR10 and Araport11 for A.thaliana and SacCer3 for S.cerevisiae. In particular, we identify multiple transient transcripts missing from the existing annotations. Our new annotations promise to improve the quality of A.thaliana and S.cerevisiae genome research.ConclusionsOur proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.
Project description:During production of the original article [1], there was a technical error that resulted in author corrections not being rendered in the PDF version of the article.
Project description:The HTML version of this Article incorrectly omits Supplementary Movie 1. Supplementary Movie 1 can be found as Supplementary Information associated with this Correction.
Project description:Numerous methods have been developed to analyse RNA sequencing (RNA-seq) data, but most rely on the availability of a reference genome, making them unsuitable for non-model organisms. Here we present superTranscripts, a substitute for a reference genome, where each gene with multiple transcripts is represented by a single sequence. The Lace software is provided to construct superTranscripts from any set of transcripts, including de novo assemblies. We demonstrate how superTranscripts enable visualisation, variant detection and differential isoform detection in non-model organisms. We further use Lace to combine reference and assembled transcriptomes for chicken and recover hundreds of gaps in the reference genome.
Project description:In the originally published version of this Article, the affiliation details for Santi González, Jian'an Luan and Claudia Langenberg were inadvertently omitted. Santi González should have been affiliated with 'Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology, 08034 Barcelona, Spain', and Jian'an Luan and Claudia Langenberg should have been affiliated with 'MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK'. Furthermore, the abstract contained an error in the SNP ID for the rare variant in chromosome Xq23, which was incorrectly given as rs146662057 and should have been rs146662075. These errors have now been corrected in both the PDF and HTML versions of the Article.
Project description:A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has not been fixed in the paper.
Project description:The original version of this Article contained an error in the Data Availability Statement. The accession code indicated '53V' and should have read '5X3V'. This has been corrected in both PDF and HTML versions of the Article.