Project description:To examine the mechanisms that control flower development, we sequenced the flower bud transcriptomes of ‘High Noon’, a reblooming cultivar of P. suffruticosa × P. lutea. Both full-length isoforms and RNA-seq were sequenced in 3 floral developmental stages. A total of 15.94 Gb raw data and 457.0 million reads were generated in full-length transcript sequencing and RNA-seq.
Project description:Accurate annotation of transcript isoforms is crucial to understand gene functions, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data remain imprecise. We developed Bookend, a software package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correct modeling of transcript start and end sites is essential for precise transcript assembly. Furthermore, we discovered that utilization of end-labeled reads present in full-length single-cell RNA-seq (scRNA-seq) datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells (mESCs) can produce end-to-end transcript annotations of comparable quality to reference annotations in these model organisms.
2022-01-08 | GSE189482 | GEO
Project description:Full-length transcript sequencing of Paralichthys olivaceus
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.
Project description:We use a laser-induced forward transfer (LIFT)-based method coupled with a full-length mRNA-sequencing protocol (LIFT-seq) for profiling region-specific tissues. LIFT-seq was applied to the cortex and hippocampus regions of the mouse brain for profiling gene expression.
Project description:We consturcted a full-length transcript reference of CD4SP and CD8SP T cells using TGS (PacBio) data. We then used SGS (Illumina) CD4SP and CD8SP data to quantify isoform expressions and study alternative splicing patterns against the full-length transcript reference between CD4SP and CD8SP cells.
Project description:Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discovery and quantification of transcript isoforms from error-prone long reads. ESPRESSO jointly considers alignments of all long reads aligned to a gene and uses error profiles of individual reads to improve the identification of splice junctions and the discovery of their corresponding transcript isoforms. On both a synthetic spike-in RNA sample and human RNA samples, ESPRESSO outperforms multiple contemporary tools in not only transcript isoform discovery but also transcript isoform quantification. In total, we generated and analyzed ~1.1 billion nanopore RNA-seq reads covering 30 human tissue samples and three human cell lines. ESPRESSO and its companion dataset provide a useful resource for studying the RNA repertoire of eukaryotic transcriptomes.
Project description:Accurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, the GENCODE reference collection of long noncoding RNAs remains far from complete: many are fragmentary, while thousands more remain uncatalogued. To accelerate lncRNA annotation, we have developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third generation long-read sequencing. We present an experimental re-annotation of the entire GENCODE intergenic lncRNA populations in matched human and mouse tissues. CLS approximately doubles the complexity of targeted loci, both in terms of validated splice junctions and transcript models. Through its identification of full-length transcript models, CLS allows the first definitive measurement of promoter features, gene structure and protein-coding potential of lncRNAs. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.