Project description:Adenovirus is a common human pathogen that relies on host cell processes for transcription and processing of viral RNA and protein production. Although adenoviral promoters, splice junctions, and cleavage and polyadenylation sites have been characterized using low-throughput biochemical techniques or short read cDNA-based sequencing, these technologies do not fully capture the complexity of the adenoviral transcriptome. By combining Illumina short-read and nanopore long-read direct RNA sequencing approaches, we mapped transcription start sites and cleavage and polyadenylation sites across the adenovirus genome. In addition to confirming the known canonical viral early and late RNA cassettes, our analysis of splice junctions within long RNA reads revealed an additional 35 novel viral transcripts. These RNAs include fourteen new splice junctions which lead to expression of canonical open reading frames (ORF), six novel ORF-containing transcripts, and fifteen transcripts encoding for messages that potentially alter protein functions through truncations or fusion of canonical ORFs. In addition, we also detect RNAs that bypass canonical cleavage sites and generate potential chimeric proteins by linking separate gene transcription units. Of these, an evolutionary conserved protein was detected containing the N-terminus of E4orf6 fused to the downstream DBP/E2A ORF. Loss of this novel protein, E4orf6/DBP, was associated with aberrant viral replication center morphology and poor viral spread. Our work highlights how long-read sequencing technologies can reveal further complexity within viral transcriptomes.
Project description:Ongoing improvements to next generation sequencing technologies are leading to longer sequencing read lengths, but a thorough understanding of the impact of longer reads on RNA sequencing analyses is lacking. To address this issue, we generated and compared two RNA sequencing datasets of differing read lengths -- 2x75 bp (L75) and 2x262 bp (L262) -- and investigated the impact of read length on various aspects of analysis, including the performance of currently available read-mapping tools, gene and transcript quantification, and detection of allele-specific expression patterns. Our results indicate that, while the scalability of read-mapping tools and the cost-effectiveness of long read protocol is an issue that requires further attention, longer reads enable more accurate quantification of diverse aspects of gene expression, including individual-specific patterns of allele-specific expression and alternative splicing. Two RNA-Seq datasets of differing read lengths (2x262 bp and 2x75 bp)
Project description:We produced an extensive transcript catalog for LCLs of 5 primate species by leveraging isoform sequencing and short-read RNA-seq. The curated transcriptomes were used to assist mass spectrometry protein identifications.
Project description:Evaluation of short-read-only, long-read-only, and hybrid assembly approaches on metagenomic samples demonstrating how they affect gene and protein prediction which is relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic, and metaproteomic data to evaluate the metagenomic-based protein predictions.
Project description:Deregulated gene expression is a hallmark of cancer, however most studies to date have analyzed short-read RNA-sequencing data with inherent limitations. Here, we combine PacBio long-read isoform sequencing (Iso-Seq) and Illumina paired-end short read RNA sequencing to comprehensively survey the transcriptome of gastric cancer (GC), a leading cause of global cancer mortality. We performed full-length transcriptome analysis across 10 GC cell lines covering four major GC molecular subtypes (chromosomal unstable, Epstein-Barr positive, genome stable and microsatellite unstable). We identify 60,239 non-redundant full-length transcripts, of which >66% are novel compared to current transcriptome databases. Novel isoforms are more likely to be cell-line and subtype specific, expressed at lower levels with larger number of exons, with longer isoform/coding sequence lengths. Most novel isoforms utilize an alternate first exon, and compared to other alternative splicing categories are expressed at higher levels and exhibit higher variability. Collectively, we observe alternate promoter usage in 25% of detected genes, with the majority (84.2%) of known/novel promoter pairs exhibiting potential changes in their coding sequences. Mapping these alternate promoters to TCGA GC samples, we identify several cancer-associated isoforms, including novel variants of oncogenes. Tumor-specific transcript isoforms tend to alter protein coding sequences to a larger extent than other isoforms. Analysis of outcome data suggests that novel isoforms may impart additional prognostic information. Our results provide a rich resource of full-length transcriptome data for deeper studies of GC and other gastrointestinal malignancies.
Project description:Long read SMRT cDNA sequencing of nascent RNA from exponentially growing S. cerevisiae and S. pombe cells was employed to obtain transcription elongation and splicing information from single transcripts. Nascent RNA was prepared from the yeast chromatin fraction (Carrillo Oesterreich, Preibisch, Neugebauer, Mol Cell 2010). The nascent 3â?? end was labeled with a 3â?? DNA adaptor through ligation. The adaptor sequence served as template for full-length reverse transcription and double-stranded cDNA was obtained in a PCR (gene-specific or transcriptome-wide). SMRT DNA sequencing libraries were prepared subsequently. Nascent RNA profiles for mainly intron-containing genes were generated with long-read SMRT cDNA sequencing.
Project description:Clear cell renal cell carcinoma (ccRCC) is the most common form of kidney cancer. To date, long-read RNA sequencing has not been applied to kidney cancer. Here, we used ONT long-read Direct RNA sequencing to profile the transcriptomes of ccRCC cell line RCC4, with and without exposure to pro-inflammatory cytokines. Our results revealed differentially expressed genes induced by the pro-inflammatory cytokines. Moreover, results here revealed potential tumour origin of novel isoforms and genes that were discovered in the archival tumour samples by long-read sequencing.