Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:Zea mays is a leading model for elucidating transcriptional networks in plants, aided by increasingly refined studies of the transcriptome atlas across spatio-temporal, developmental, and environmental dimensions. Limiting this progress are uncertainties about the complete structure mRNA transcripts, particularly with respect to alternatively spliced isoforms. Although second-generation RNA-seq provides a quantitative assay for transcriptional and posttranscriptional events, the accurate reconstruction of full-length mRNA isoforms is challenging with short-read technologies. By producing much longer reads, third generation sequencing offers to solve the assembly problem, but can suffer from lower read accuracy and throughput. Here, we combine these complementary technologies to define and quantify high-confidence transcript isoforms in maize. Six tissues (root, pollen, embryo, endosperm, immature ear, and immature tassel) of the B73 inbred line were used for mRNA sequencing with the Illumina Hiseq2000 PE101 platform to comprehensively quantitate gene/isoform expression. In parallel, intact cDNAs from the same samples were sequenced using the PacBio RS II platform. The latter used six size fractionated libraries (<1kb, 1-2kb, 2-3kb, 3kb-5kb, 4-6kb,>5kb) to generate more than 2 million full length reads. Preliminary findings suggest that mechanisms of alternative splicing are differentially employed between different tissues. In addition, these data show promise to dramatically improve the status of maize genome annotation, with the detection of previously unidentified transcript isoforms, and uncovering previously unrecognized genes. This submission is data of Illumina Hiseq2000 PE101 reads.
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.
Project description:Purpose: The goal of this study is to determine the expression profile of HHV-6B in whole blood samples from transplant patients with HHV-6B genomes detected in plasma Methods: Viral mRNA profiles in whole blood samples of different transplant patients were generated by deep sequencing using Illumina Hiseq. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation for U38 expression was performed using TaqMan Results: Hierarchical clustering uncovered systematic detection of U38 mRNA transcripts in whole blood samples from all patients with a detectable HHV-6B reactivation in their plasma Conclusions: Our study represents the first detailed analysis of viral transcriptomes associated with HHV-6B reactivation in whole blood samples generated by RNA-seq technology. Our results show that NGS offers a comprehensive and more accurate qualitative evaluation of mRNA content within a whole blood sample.
Project description:Adenovirus is a common human pathogen that relies on host cell processes for transcription and processing of viral RNA and protein production. Although adenoviral promoters, splice junctions, and cleavage and polyadenylation sites have been characterized using low-throughput biochemical techniques or short read cDNA-based sequencing, these technologies do not fully capture the complexity of the adenoviral transcriptome. By combining Illumina short-read and nanopore long-read direct RNA sequencing approaches, we mapped transcription start sites and cleavage and polyadenylation sites across the adenovirus genome. In addition to confirming the known canonical viral early and late RNA cassettes, our analysis of splice junctions within long RNA reads revealed an additional 35 novel viral transcripts. These RNAs include fourteen new splice junctions which lead to expression of canonical open reading frames (ORF), six novel ORF-containing transcripts, and fifteen transcripts encoding for messages that potentially alter protein functions through truncations or fusion of canonical ORFs. In addition, we also detect RNAs that bypass canonical cleavage sites and generate potential chimeric proteins by linking separate gene transcription units. Of these, an evolutionary conserved protein was detected containing the N-terminus of E4orf6 fused to the downstream DBP/E2A ORF. Loss of this novel protein, E4orf6/DBP, was associated with aberrant viral replication center morphology and poor viral spread. Our work highlights how long-read sequencing technologies can reveal further complexity within viral transcriptomes.
Project description:Purpose: Perform RNA-seq study on short bovine endometrial tissues to reveal important genes and biological pathways related to uterine development and physiology Methods:RNA sequencings were done using Illumina platform. Single-end reads in the FASTQ format were explored using FastQC, low-quality reads were trimmed from both 3’ and 5’ ends until a base pair of Phred quality score of 30 (99.9% accurate) or greater was found, reads having a mean quality score less than 30 and length below 30 nucleotides were filtered out. Cleaned reads were aligned against the bovine reference genome (Bos_taurus.ARS-UCD1.2) using HiSAT2. The resulting SAM files were sorted, converted to BAM files using SAMtools. Read counts mapped to bovine gene models were generated using htseq-count script from HTSeq package. Bioconductor DESeq2 was used to get the differentially expressed genes among short vs normal uterine tract groups Conclusions: Heifers with short uterine tract had significantly decreased endometrial layers, uterine glands, and altered transcriptomic profiles. The decrease in uterine glands probably resulted in lower uterine secretions necessary to support embryo growth and development. As a result, heifers with short uteri were infertile even when they were bred by fertile bulls.