Project description:Pseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. When transcribed, pseudogenes may encode proteins or enact RNA-intrinsic regulatory mechanisms. However, the extent, characteristics and functional relevance of the human pseudogene transcriptome are unclear. Short-read sequencing platforms have limited power to resolve and accurately quantify pseudogene transcripts owing to the high sequence similarity of pseudogenes and their parent genes. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes. Pseudogene transcripts are expressed in tissue-specific patterns, exhibit complex splicing patterns and contribute to the coding sequences of known genes. We survey pseudogene transcripts encoding intact open reading frames (ORFs), representing potential unannotated protein-coding genes, and demonstrate their efficient translation in cultured cells. To assess the impact of noncoding pseudogenes on the cellular transcriptome, we delete the nucleus-enriched pseudogene PDCL3P4 transcript from HAP1 cells and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the transcriptional landscape underpinning human biology and disease.
Project description:Adenovirus is a common human pathogen that relies on host cell processes for transcription and processing of viral RNA and protein production. Although adenoviral promoters, splice junctions, and cleavage and polyadenylation sites have been characterized using low-throughput biochemical techniques or short read cDNA-based sequencing, these technologies do not fully capture the complexity of the adenoviral transcriptome. By combining Illumina short-read and nanopore long-read direct RNA sequencing approaches, we mapped transcription start sites and cleavage and polyadenylation sites across the adenovirus genome. In addition to confirming the known canonical viral early and late RNA cassettes, our analysis of splice junctions within long RNA reads revealed an additional 35 novel viral transcripts. These RNAs include fourteen new splice junctions which lead to expression of canonical open reading frames (ORF), six novel ORF-containing transcripts, and fifteen transcripts encoding for messages that potentially alter protein functions through truncations or fusion of canonical ORFs. In addition, we also detect RNAs that bypass canonical cleavage sites and generate potential chimeric proteins by linking separate gene transcription units. Of these, an evolutionary conserved protein was detected containing the N-terminus of E4orf6 fused to the downstream DBP/E2A ORF. Loss of this novel protein, E4orf6/DBP, was associated with aberrant viral replication center morphology and poor viral spread. Our work highlights how long-read sequencing technologies can reveal further complexity within viral transcriptomes.
Project description:To identify aberrant splicing isoforms and potential neoantigens, we performed full-length cDNA sequencing of lung adenocarcinoma cell lines using a long-read sequencer MinION. We constructed a comprehensive catalog of aberrant splicing isoforms and detected isoform-specific peptides using proteome analysis.
Project description:Objectives: To perform long-read transcriptome and proteome profiling of pathogen-stimulated peripheral blood mononuclear cells (PBMCs) from healthy donors. We aim to discover new transcripts and protein isoforms expressed during immune responses to diverse pathogens. Methods: PBMCs were exposed to four microbial stimuli for 24 hours: the TLR4 ligand lipopolysaccharide (LPS), the TLR3 ligand Poly(I:C), heat-inactivated Staphylococcus aureus, Candida albicans, and RPMI medium as negative controls. Long-read sequencing (PacBio) of one donor and secretome proteomics and short-read sequencing of five donors were performed. IsoQuant was used for transcriptome construction, Metamorpheus/FlashLFQ for proteome analysis, and Illumina short-read 3’-end mRNA sequencing for transcript quantification. Results: Long-read transcriptome profiling reveals the expression of novel sequences and isoform switching induced upon pathogen stimulation, including transcripts that are difficult to detect using traditional short-read sequencing. We observe widespread loss of intron retention as a common result of all pathogen stimulations. We highlight novel transcripts of NFKB1 and CASP1 that may indicate novel immunological mechanisms. In general, RNA expression differences did not result in differences in the amounts of secreted proteins. Interindividual differences in the proteome were larger than the differences between stimulated and unstimulated PBMCs. Clustering analysis of secreted proteins revealed a correlation between chemokine (receptor) expression on the RNA and protein levels in C. albicans- and Poly(I:C)-stimulated PBMCs. Conclusion: Isoform aware long-read sequencing of pathogen-stimulated immune cells highlights the potential of these methods to identify novel transcripts, revealing a more complex transcriptome landscape than previously appreciated.
Project description:Long read SMRT cDNA sequencing of nascent RNA from exponentially growing S. cerevisiae and S. pombe cells was employed to obtain transcription elongation and splicing information from single transcripts. Nascent RNA was prepared from the yeast chromatin fraction (Carrillo Oesterreich, Preibisch, Neugebauer, Mol Cell 2010). The nascent 3â?? end was labeled with a 3â?? DNA adaptor through ligation. The adaptor sequence served as template for full-length reverse transcription and double-stranded cDNA was obtained in a PCR (gene-specific or transcriptome-wide). SMRT DNA sequencing libraries were prepared subsequently. Nascent RNA profiles for mainly intron-containing genes were generated with long-read SMRT cDNA sequencing.
Project description:While numerous studies have described the transcriptomes of EVs in different cellular contexts, these efforts have typically relied on sequencing methods requiring RNA fragmentation, which limits interpretations on the integrity and isoform diversity of EV-encapsulated RNA populations. Furthermore, it has been assumed that mRNA signatures in EVs are likely to be fragmentation products of the cellular mRNA material, and little is known about the extent to which full-length mRNAs are present within EVs. Using Oxford nanopore long-read RNA sequencing, we sought to characterize the full-length polyadenylated (poly-A) transcriptome of EVs released by human chronic myelogenous leukemia K562 cells. We detected 441 and 280 RNAs that were respectively enriched or depleted in EVs. EV-enriched poly-A transcripts consist of a variety of biotypes, including mRNAs, long non-coding RNAs, and pseudogenes. Our analysis revealed that 12.72% of all reads present in EVs corresponded to known full-length transcripts, 65.34% of which were mRNAs. We also observed that for many well-represented coding and non-coding genes, diverse full-length transcript isoforms were present in EV specimens, and these isoforms were reflective-of but often in different ratio compared to cellular samples. Here we report a full-length transcriptome from human EVs, as determined by long-read nanopore sequencing.
Project description:Pseudogenes are defined as regions of the genome that resemble functional genes but contain disabling mutations and lack regulatory elements needed for transcription or translation. They are excellent markers for genome evolution and are emerging as crucial regulators of the development and disease, especially cancer. However, systematic functional characterization and evolution of pseudogene remain largely unexplored. In particular, the contribution of pseudogene to organ development is still unknown. Meanwhile, studies of pseudogene transcription, which is the first step for generating functional RNA, is precluded by the limited capacity of short-read sequencing. To address these issues, we systematically inferred the origin time and characterized the evolution pattern of pseudogenes. We leveraged PacBio full-length sequencing in combination with deep Illumina data as well as public developmental time-course RNA-seq, we dramatically expanded the analyzed samples and profiled genome-wide pseudogene expression paradigm. Additionally, we prioritized functional pseudogenes at multiple regulatory layers and determined their implications in disease and cancer biology.