Project description:These data correspond to one SMRT cell sequencing run (performed on Sequel II, PacBio) of full length cDNAs from 3 pooled glioma stem cell line libraries. No tag was added to distinguish the 3 different samples
Project description:Deregulated gene expression is a hallmark of cancer, however most studies to date have analyzed short-read RNA-sequencing data with inherent limitations. Here, we combine PacBio long-read isoform sequencing (Iso-Seq) and Illumina paired-end short read RNA sequencing to comprehensively survey the transcriptome of gastric cancer (GC), a leading cause of global cancer mortality. We performed full-length transcriptome analysis across 10 GC cell lines covering four major GC molecular subtypes (chromosomal unstable, Epstein-Barr positive, genome stable and microsatellite unstable). We identify 60,239 non-redundant full-length transcripts, of which >66% are novel compared to current transcriptome databases. Novel isoforms are more likely to be cell-line and subtype specific, expressed at lower levels with larger number of exons, with longer isoform/coding sequence lengths. Most novel isoforms utilize an alternate first exon, and compared to other alternative splicing categories are expressed at higher levels and exhibit higher variability. Collectively, we observe alternate promoter usage in 25% of detected genes, with the majority (84.2%) of known/novel promoter pairs exhibiting potential changes in their coding sequences. Mapping these alternate promoters to TCGA GC samples, we identify several cancer-associated isoforms, including novel variants of oncogenes. Tumor-specific transcript isoforms tend to alter protein coding sequences to a larger extent than other isoforms. Analysis of outcome data suggests that novel isoforms may impart additional prognostic information. Our results provide a rich resource of full-length transcriptome data for deeper studies of GC and other gastrointestinal malignancies.
Project description:Full-Length cDNA transcriptome (Iso-Seq) data sequenced on the PacBio Sequel system using 2.1 chemistry. Multiplexed cDNA library of 12 samples (3 tissues x 4 strains). Tissues: root, embryo, endosperm. Strains: B73, Ki11, B73xKi11, Ki11xB73.
Project description:Zea mays is a leading model for elucidating transcriptional networks in plants, aided by increasingly refined studies of the transcriptome atlas across spatio-temporal, developmental, and environmental dimensions. Limiting this progress are uncertainties about the complete structure mRNA transcripts, particularly with respect to alternatively spliced isoforms. Although second-generation RNA-seq provides a quantitative assay for transcriptional and posttranscriptional events, the accurate reconstruction of full-length mRNA isoforms is challenging with short-read technologies. By producing much longer reads, third generation sequencing offers to solve the assembly problem, but can suffer from lower read accuracy and throughput. Here, we combine these complementary technologies to define and quantify high-confidence transcript isoforms in maize. Six tissues (root, pollen, embryo, endosperm, immature ear, and immature tassel) of the B73 inbred line were used for mRNA sequencing with the Illumina Hiseq2000 PE101 platform to comprehensively quantitate gene/isoform expression. In parallel, intact cDNAs from the same samples were sequenced using the PacBio RS II platform. The latter used six size fractionated libraries (<1kb, 1-2kb, 2-3kb, 3kb-5kb, 4-6kb,>5kb) to generate more than 2 million full length reads. Preliminary findings suggest that mechanisms of alternative splicing are differentially employed between different tissues. In addition, these data show promise to dramatically improve the status of maize genome annotation, with the detection of previously unidentified transcript isoforms, and uncovering previously unrecognized genes. This submission is data of Illumina Hiseq2000 PE101 reads.
Project description:Rapidly increased studies by third-generation sequencing [Pacific Biosciences (Pacbio) and Oxford Nanopore Technologies (ONT)] have been used in all kinds of research areas. Among them, the plant full-length single-molecule transcriptome studies were most used by Pacbio while ONT was rarely used. Therefore, in this study, we developed ONT RNA-sequencing methods in plants. We performed a detailed evaluation of reads from Pacbio and Nanopore PCR cDNA (ONT Pc) sequencing in plants (Arabidopsis), including the characteristics of raw data and identification of transcripts. We aimed to provide a valuable reference for applications of ONT in plant transcriptome analysis.