Project description:Objectives: To perform long-read transcriptome and proteome profiling of pathogen-stimulated peripheral blood mononuclear cells (PBMCs) from healthy donors. We aim to discover new transcripts and protein isoforms expressed during immune responses to diverse pathogens. Methods: PBMCs were exposed to four microbial stimuli for 24 hours: the TLR4 ligand lipopolysaccharide (LPS), the TLR3 ligand Poly(I:C), heat-inactivated Staphylococcus aureus, Candida albicans, and RPMI medium as negative controls. Long-read sequencing (PacBio) of one donor and secretome proteomics and short-read sequencing of five donors were performed. IsoQuant was used for transcriptome construction, Metamorpheus/FlashLFQ for proteome analysis, and Illumina short-read 3’-end mRNA sequencing for transcript quantification. Results: Long-read transcriptome profiling reveals the expression of novel sequences and isoform switching induced upon pathogen stimulation, including transcripts that are difficult to detect using traditional short-read sequencing. We observe widespread loss of intron retention as a common result of all pathogen stimulations. We highlight novel transcripts of NFKB1 and CASP1 that may indicate novel immunological mechanisms. In general, RNA expression differences did not result in differences in the amounts of secreted proteins. Interindividual differences in the proteome were larger than the differences between stimulated and unstimulated PBMCs. Clustering analysis of secreted proteins revealed a correlation between chemokine (receptor) expression on the RNA and protein levels in C. albicans- and Poly(I:C)-stimulated PBMCs. Conclusion: Isoform aware long-read sequencing of pathogen-stimulated immune cells highlights the potential of these methods to identify novel transcripts, revealing a more complex transcriptome landscape than previously appreciated.
Project description:The human neural retina is enriched for alternative splicing, and it is estimated that more than 10% of variants associated with inherited retinal diseases (IRDs) alter splicing. Previous research mainly used short-read RNA-sequencing techniques to investigate retina-specific splicing and splicing factors. However, this technique provides limited information about transcript isoforms. To gain a deeper understanding of the human neural retina and its isoforms, we generated a proteogenomic atlas that combined PacBio long-read RNA-sequencing data with mass-spectrometry and whole-genome sequencing data from three healthy human neural retina samples. RNA-sequencing revealed that one-third of all transcripts were novel, and for IRD-associated genes, even 43% were novel. The most common novel elements of these transcripts were alternative poly(A) sites, exon elongation, and intron retention. Some novel elements affect the non-coding region but for more than 50% of the novel transcripts a novel open reading frame was predicted. Using proteomics, ten novel peptides confirmed novel isoforms in five genes. Additionally, we found novel isoforms of IMPDH1, an IRD-associated gene, with supporting peptide evidence. This study provides a comprehensive overview of the transcript and protein isoforms expressed in the healthy human neural retina. Moreover, it highlights the importance of studying tissue specific transcriptomes in greater detail to better understand tissue-specific regulation and to identify disease-causing variants.
Project description:Long-read Nanopore cDNA sequencing of polyA-enriched RNA was implemented in a range of adult tissues isolated from cattle, pig, and chicken. These data were used to identify and characterize the expression patterns of full-length transcript isoforms.
Project description:The transcriptome profiles of the model plant Arabidopsis thaliana have been extensively studied and charcaterised under different developmental and physiological conditions. However, most of these “RNA-sequencing” datasets have been generated using the sequencing of reverse-transcribed cDNAs from mRNAs that have a relatively short read length. Here, we performed direct RNA sequencing using the latest Oxford Nanopore Technology (ONT) with unusual read length. We demonstrate that the complexity of the A. thaliana transcriptomes has been under-estimated. The ONT direct RNA sequencing technology identified transcript isoforms at a vegetative (14 day old seedlings, stage 1.04) and a reproductive stage (stage 6.00-6) when 10% of the flowers had opened. In-house software called TrackCluster was used to determine alternative transcription initiation (ATI), possible alternative polyadenylation (APA), poly(A) length, alternative splicing (AS), and fusion transcripts. Tombo software was used to detect RNA base modifications. More than 38,500 novel transcript isoforms were identified, including six categories of fusion-transcripts which may result from differential RNA processing mechanisms. Fusion-transcripts are prone to mis-assembly by sequencing with short reads using next-generation-sequencing (NGS). These new transcript isoforms provide important additions to the annotated Arabidopsis genome. The power of ONT in detecting RNA modifications was demonstrated by characterisation of the modifications between mobile mRNAs and total mRNAs. The mobile mRNAs were enriched in m5C modifications, which is consistent with a recent finding that m5C modification in mRNAs is crucial for their long-distance movement. In summary, ONT direct RNA sequencing greatly enhances the identification of novel RNA transcript isoforms and RNA base modifications.
Project description:Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discovery and quantification of transcript isoforms from error-prone long reads. ESPRESSO jointly considers alignments of all long reads aligned to a gene and uses error profiles of individual reads to improve the identification of splice junctions and the discovery of their corresponding transcript isoforms. On both a synthetic spike-in RNA sample and human RNA samples, ESPRESSO outperforms multiple contemporary tools in not only transcript isoform discovery but also transcript isoform quantification. In total, we generated and analyzed ~1.1 billion nanopore RNA-seq reads covering 30 human tissue samples and three human cell lines. ESPRESSO and its companion dataset provide a useful resource for studying the RNA repertoire of eukaryotic transcriptomes.
Project description:In this work, we generated a MHCC97H proteome datasets by In-gel digestion on Orbitrap Fusion Lumos. We systematically compared RNC-seq and Ribo-seq in the context of proteome identification, especially when identifying protein isoforms from AS. We also demonstrated that the single-molecule long read sequencing technique identified thousands of new splice variants and guided the MS identifications of new protein isoforms.
Project description:To identify aberrant splicing isoforms and potential neoantigens, we performed full-length cDNA sequencing of lung adenocarcinoma cell lines using a long-read sequencer MinION. We constructed a comprehensive catalog of aberrant splicing isoforms and detected isoform-specific peptides using proteome analysis.
Project description:Deregulated gene expression is a hallmark of cancer, however most studies to date have analyzed short-read RNA-sequencing data with inherent limitations. Here, we combine PacBio long-read isoform sequencing (Iso-Seq) and Illumina paired-end short read RNA sequencing to comprehensively survey the transcriptome of gastric cancer (GC), a leading cause of global cancer mortality. We performed full-length transcriptome analysis across 10 GC cell lines covering four major GC molecular subtypes (chromosomal unstable, Epstein-Barr positive, genome stable and microsatellite unstable). We identify 60,239 non-redundant full-length transcripts, of which >66% are novel compared to current transcriptome databases. Novel isoforms are more likely to be cell-line and subtype specific, expressed at lower levels with larger number of exons, with longer isoform/coding sequence lengths. Most novel isoforms utilize an alternate first exon, and compared to other alternative splicing categories are expressed at higher levels and exhibit higher variability. Collectively, we observe alternate promoter usage in 25% of detected genes, with the majority (84.2%) of known/novel promoter pairs exhibiting potential changes in their coding sequences. Mapping these alternate promoters to TCGA GC samples, we identify several cancer-associated isoforms, including novel variants of oncogenes. Tumor-specific transcript isoforms tend to alter protein coding sequences to a larger extent than other isoforms. Analysis of outcome data suggests that novel isoforms may impart additional prognostic information. Our results provide a rich resource of full-length transcriptome data for deeper studies of GC and other gastrointestinal malignancies.
Project description:Alternative splicing (AS) isoforms create numerous proteoforms, expanding the complexity of the genome. Highly similar sequences, incomplete reference databases and the insufficient sequence coverage of mass spectrometry limit the identification of AS proteoforms.In this work, we compared RNC-seq and Ribo-seq in the context of proteome identification, especially when identifying protein isoforms from AS. We also demonstrated that the single-molecule long read sequencing technique identified thousands of new splice variants and guided the MS identifications of new protein isoforms.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.