Project description:Polyadenylation at the 3’ end of eukaryotic messenger RNAs enhances mRNA stability and translational efficiency. Global analysis for poly(A) tail lengths may shed lights on various aspects of gene regulation studies. Two NGS-based methods have been introduced for genome-wide poly(A) profiling, and they have shown human poly(A) profiles with shorter than previously conceived tail lengths. However, both methods are technically challenging and difficult to be repeated or widely adapted. Here we present a more straightforward method for poly(A) profiling. Poly(A)-seq performed on Illumina NextSeq 500 produces single-end 300 nt reads that covers the entirety of poly(A) tails, and poly(A) lengths can be directly calculated from base call data. With Poly(A)-seq we report that the global poly(A) lengths of several human cell lines may be longer than previously reported. We also show that the size selection step during Poly(A)-seq library preparation may greatly affect the sequencing profile, and thus cautions should be taken for comparisons between samples. As a convenient tool, we hope wide applications of Poly(A)-seq helps to bring better understanding of poly(A) tail properties and functions.
Project description:Poly(A) tails enhance the stability and translation of most eukaryotic messenger RNAs, but difficulties in globally measuring poly(A)-tail lengths have impeded greater understanding of poly(A)-tail function. Here we describe poly(A)-tail length profiling by sequencing (PAL-seq) and apply it to measure tail lengths of millions of individual RNAs isolated from yeasts, cell lines, Arabidopsis thaliana leaves, mouse liver, and zebrafish and frog embryos. Poly(A)-tail lengths were conserved between orthologous mRNAs, with mRNAs encoding ribosomal proteins and other 'housekeeping' proteins tending to have shorter tails. As expected, tail lengths were coupled to translational efficiencies in early zebrafish and frog embryos. However, this strong coupling diminished at gastrulation and was absent in non-embryonic samples, indicating a rapid developmental switch in the nature of translational control. This switch complements an earlier switch to zygotic transcriptional control and explains why the predominant effect of microRNA mediated deadenylation concurrently shifts from translational repression to mRNA destabilization.
Project description:Poly(A) tails enhance the stability and translation of most eukaryotic messenger RNAs, but difficulties in globally measuring poly(A)-tail lengths have impeded greater understanding of poly(A)-tail function. Here we describe poly(A)-tail length profiling by sequencing (PAL-seq) and apply it to measure tail lengths of millions of individual RNAs isolated from yeasts, cell lines, Arabidopsis thaliana leaves, mouse liver, and zebrafish and frog embryos. Poly(A)-tail lengths were conserved between orthologous mRNAs, with mRNAs encoding ribosomal proteins and other 'housekeeping' proteins tending to have shorter tails. As expected, tail lengths were coupled to translational efficiencies in early zebrafish and frog embryos. However, this strong coupling diminished at gastrulation and was absent in non-embryonic samples, indicating a rapid developmental switch in the nature of translational control. This switch complements an earlier switch to zygotic transcriptional control and explains why the predominant effect of microRNA mediated deadenylation concurrently shifts from translational repression to mRNA destabilization. 64 samples from a variety of species
Project description:While numerous studies have described the transcriptomes of EVs in different cellular contexts, these efforts have typically relied on sequencing methods requiring RNA fragmentation, which limits interpretations on the integrity and isoform diversity of EV-encapsulated RNA populations. Furthermore, it has been assumed that mRNA signatures in EVs are likely to be fragmentation products of the cellular mRNA material, and little is known about the extent to which full-length mRNAs are present within EVs. Using Oxford nanopore long-read RNA sequencing, we sought to characterize the full-length polyadenylated (poly-A) transcriptome of EVs released by human chronic myelogenous leukemia K562 cells. We detected 441 and 280 RNAs that were respectively enriched or depleted in EVs. EV-enriched poly-A transcripts consist of a variety of biotypes, including mRNAs, long non-coding RNAs, and pseudogenes. Our analysis revealed that 12.72% of all reads present in EVs corresponded to known full-length transcripts, 65.34% of which were mRNAs. We also observed that for many well-represented coding and non-coding genes, diverse full-length transcript isoforms were present in EV specimens, and these isoforms were reflective-of but often in different ratio compared to cellular samples. Here we report a full-length transcriptome from human EVs, as determined by long-read nanopore sequencing.
Project description:While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a training data set, which is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 units, and by as much as 13 units at CpG sites. In addition, since reads mapping to the genome are not used for recalibration, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration.