Project description:Accurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, the GENCODE reference collection of long noncoding RNAs remains far from complete: many are fragmentary, while thousands more remain uncatalogued. To accelerate lncRNA annotation, we have developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third generation long-read sequencing. We present an experimental re-annotation of the entire GENCODE intergenic lncRNA populations in matched human and mouse tissues. CLS approximately doubles the complexity of targeted loci, both in terms of validated splice junctions and transcript models. Through its identification of full-length transcript models, CLS allows the first definitive measurement of promoter features, gene structure and protein-coding potential of lncRNAs. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.
Project description:Long non-coding RNAs (lncRNA) constitute a large fraction of mammalian transcriptomes that still remains unexplored, mainly due to the lack of comprehensive, high-quality lncRNA annotation that limits the possibility to fully explore their functional capacity. We have developed RACE-seq, an experimental workflow based on RACE (Rapid Amplification of cDNA Ends) and long read RNA sequencing, aimed at both rare isoform discovery and better definition of gene boundaries. We applied 3â and 5â RACE-seq on 398 low-expressed GENCODE v7 lncRNA genes in seven human tissues (brain, testis, heart, kidney, liver, lung and spleen). The sequences obtained led to the discovery of 2,641 on-target, previously unknown alternative transcripts. Novel isoforms extended 60% of the 398 targeted lncRNA loci further in either 5' or 3', and often reached genome hallmarks typical of gene boundaries. In parallel, we used nested RACE-seq, and confirmed that nested RACE-seq has overwhelmingly better sensitivity than its standard counterpart.
Project description:Long-read sequencing technologies such as Iso-Seq (PacBio Inc.) generate highly accurate sequences of full-length mRNA transcript isoforms. Long-read transcriptomics may be especially useful in the context of lymphocyte functional plasticity as it relates to human health and disease. However, no long-read isoform-aware reference transcriptomes of human circulating lymphocytes seem to be publicly available despite being valuable as benchmarks in a variety of transcriptomic studies. To begin to fill this gap, we purified four lymphocyte subsets (CD4 T, CD8 T, NK, and Pan B cells) from the peripheral blood of a healthy male donor and obtained high-quality RNA (RIN>8) for PacBio Iso-Seq analysis and parallel RNA-Seq analysis.
Project description:Long-read sequencing technologies such as Iso-Seq (PacBio Inc.) generate highly accurate sequences of full-length mRNA transcript isoforms. Long-read transcriptomics may be especially useful in the context of lymphocyte functional plasticity as it relates to human health and disease. However, no long-read isoform-aware reference transcriptomes of human circulating lymphocytes seem to be publicly available despite being valuable as benchmarks in a variety of transcriptomic studies. To begin to fill this gap, we purified four lymphocyte subsets (CD4 T, CD8 T, NK, and Pan B cells) from the peripheral blood of a healthy male donor and obtained high-quality RNA (RIN>8) for PacBio Iso-Seq analysis and parallel RNA-Seq analysis.