Project description:High throughput RNA sequencing (RNA-seq) using cDNA has played a key role in delineating transcriptome complexity, including alternative transcription initiation, splicing, polyadenylation and base modification. However, the reads derived from current RNA-seq technologies are usually short and deprived of information on modification during reverse transcription, compromising their potential in defining transcriptome complexity. Here we applied a direct RNA sequencing method with ultra-long reads from Oxford Nanopore Technologies (ONT) to study the transcriptome complexity in C. elegans. We sequenced native poly-A tailed mRNAs by generating approximately six million reads from embryos, L1 larvae and young adult animals, with average read lengths ranging from 900 to 1,100 bps across stages. Around half of the reads represent full-length transcripts, judged by the presence of a splicing-leader or their full coverage of an existing transcript. To take advantage of the full-length transcripts in defining transcriptome complexity, we devised a novel algorithm to predict novel isoforms or group them with exiting isoforms using their mapping tracks rather than the existing intron/exon structures, which allowed us to identify roughly 57,000 novel isoforms and recover at least 26,000 out of the 33,500 existing isoforms. Intriguingly, stage-specific expression at the level of gene and isoform demonstrates little correlation. Finally, we observed an elevated level of modification in all bases in the coding region relative to the UTR. Taken together, the ONT long reads are expected to deliver new insights into RNA processing and modification and their underlying biology.
Project description:Massively parallel sequencing of the polyadenylated RNAs has played a key role in delineating transcriptome complexity, including alternative use of an exon, promoter, 5' or 3' splice site or polyadenylation site, and RNA modification. However, reads derived from the current RNA-seq technologies are usually short and deprived of information on modification, compromising their potential in defining transcriptome complexity. Here, we applied a direct RNA sequencing method with ultralong reads using Oxford Nanopore Technologies to study the transcriptome complexity in Caenorhabditis elegans We generated approximately six million reads using native poly(A)-tailed mRNAs from three developmental stages, with average read lengths ranging from 900 to 1100 nt. Around half of the reads represent full-length transcripts. To utilize the full-length transcripts in defining transcriptome complexity, we devised a method to classify the long reads as the same as existing transcripts or as a novel transcript using sequence mapping tracks rather than existing intron/exon structures, which allowed us to identify roughly 57,000 novel isoforms and recover at least 26,000 out of the 33,500 existing isoforms. The sets of genes with differential expression versus differential isoform usage over development are largely different, implying a fine-tuned regulation at isoform level. We also observed an unexpected increase in putative RNA modification in all bases in the coding region relative to the UTR, suggesting their possible roles in translation. The RNA reads and the method for read classification are expected to deliver new insights into RNA processing and modification and their underlying biology in the future.