Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.
Project description:Transposon insertion site sequencing (TIS) is a powerful method for associating genotype to phenotype. However, all TIS methods described to date use short nucleotide sequence reads which cannot uniquely determine the locations of transposon insertions within repeating genomic sequences where the repeat units are longer than the sequence read length. To overcome this limitation, we have developed a TIS method using Oxford Nanopore sequencing technology that generates and uses long nucleotide sequence reads; we have called this method LoRTIS (Long Read Transposon Insertion-site Sequencing). This experiment data contains sequence files generated using Nanopore and Illumina platforms. Biotin1308.fastq.gz and Biotin2508.fastq.gz are fastq files generated from nanopore technology. Rep1-Tn.fastq.gz and Rep1-Tn.fastq.gz are fastq files generated using Illumina platform. In this study, we have compared the efficiency of two methods in identification of transposon insertion sites.
Project description:In this work, we collected and analyzed two cohorts of young-adult and aged-adult mice brain mRNAs and determined their levels using second- (illumina) and third-generation (Oxford Nanopore) sequencing technologies. We report a transcriptome-wide study of differential transcript usage during brain aging. In addition, we provide the community with a large resource of whole brain transcriptomes and comprehensive analyses that identify widespread diversity of mRNAs during aging.
Project description:The retina, a complex neural tissue, encompasses a vast diversity of over 100 distinct cell types, each characterized by unique features such as morphology, function, location, and transcriptomic profiles. However, the extent of mRNA alternative splicing within individual cell types in the retina remains largely uncharted territory. To bridge this knowledge gap, we employed single-cell RNA sequencing, utilizing both short-read and long-read high-throughput technologies. Our comprehensive analysis profiled the transcriptomes of 29,191 mouse retina cells, resulting in a dataset comprising 1.54 billion Illumina short reads and 1.40 billion Oxford Nanopore long reads. Remarkably, this exploration unveiled a staggering 44,325 transcript isoforms, with 38% of them representing entirely novel discoveries and 17% exhibiting cell-class-specific expression patterns. Intriguingly, while many isoforms were expressed across various cell types, their distribution often displayed variability among them. This research substantially enriches the catalog of transcript isoforms, laying the groundwork for further investigations into the role of alternative splicing in retinal biology and its implications for related diseases. It represents a crucial step toward unraveling the intricacies of retinal function and offers promising insights into the molecular mechanisms underlying retinal disorders.
Project description:Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discovery and quantification of transcript isoforms from error-prone long reads. ESPRESSO jointly considers alignments of all long reads aligned to a gene and uses error profiles of individual reads to improve the identification of splice junctions and the discovery of their corresponding transcript isoforms. On both a synthetic spike-in RNA sample and human RNA samples, ESPRESSO outperforms multiple contemporary tools in not only transcript isoform discovery but also transcript isoform quantification. In total, we generated and analyzed ~1.1 billion nanopore RNA-seq reads covering 30 human tissue samples and three human cell lines. ESPRESSO and its companion dataset provide a useful resource for studying the RNA repertoire of eukaryotic transcriptomes.
Project description:Whole-genome bisulfite sequencing (WGBS) is currently the gold standard for DNA methylation (5-methylcytosine, 5mC) profiling, however the destructive nature of sodium bisulfite results in DNA fragmentation and subsequent biases in sequencing data. Such issues have led to the development of bisulfite-free methods for 5mC detection. Nanopore sequencing is a long read non-destructive approach that directly analyzes DNA and RNA fragments in real time. Recently, computational tools have been developed that enable base-resolution detection of 5mC from Oxford Nanopore sequencing data. In this chapter we provide a detailed protocol for preparation, sequencing, read assembly and analysis of genome-wide 5mC using Nanopore sequencing technologies.
Project description:We developed ONT-cappable-seq, a specialized long-read RNA sequencing technique that allows end-to-end sequencing of primary prokaryotic transcripts using the Nanopore sequencing platform. We applied ONT-cappable-seq to study the transcriptional landscape of Pseudomonas aeruginosa phage LUZ7, leading to a comprehensive genome-wide map of viral transcription start sites, terminators and complex operon structures that fine-regulate gene expression. At the same time, it provides new insights in the RNA biology of LUZ7 and paves the way for more in depth transcription studies that can help unveil the complex layers of phage-host interactions.