Project description:Colorectal cancer (CRC) ranks as the second leading cause of cancer deaths globally. In recent years, short-read single-cell RNA sequencing (scRNA-seq) has been instrumental in deciphering tumor heterogeneities. However, these studies only enable gene-level quantification but neglect alterations in transcript structures arising from alternative end processing or splicing. In this study, we integrated short- and long-read scRNA-seq of CRC samples to build an isoform-resolution CRC transcriptomic atlas. We identified 394 dysregulated transcript structures in tumor epithelial cells, including 299 resulting from various combinations of splicing events. Second, we characterized genes and isoforms associated with epithelial lineages and subpopulations exhibiting distinct prognoses. Among 31,935 isoforms with novel junctions, 330 were supported by The Cancer Genome Atlas RNA-seq and mass spectrometry data. Finally, we built an algorithm that integrated novel peptides derived from open reading frames of recurrent tumor-specific transcripts with mass spectrometry data and identified recurring neoepitopes that may aid the development of cancer vaccines.
Project description:Pioneering studies (PXD014844) have identified many interesting molecules in tick saliva by LC-MS/MS proteomics, but the protein databases used to assign mass spectra were based on short Illumina reads of the Amblyomma americanum transcriptome and may not have captured the diversity and complexity of longer transcripts. Here we apply long-read Pacific Bioscience technologies to complement the previously reported short-read Illumina transcriptome-based proteome in an effort to increase spectrum assignments. Our dataset reveals a small increase in assignable spectra to supplement the previously released short-read transcriptome-based proteome.
Project description:Ongoing improvements to next generation sequencing technologies are leading to longer sequencing read lengths, but a thorough understanding of the impact of longer reads on RNA sequencing analyses is lacking. To address this issue, we generated and compared two RNA sequencing datasets of differing read lengths -- 2x75 bp (L75) and 2x262 bp (L262) -- and investigated the impact of read length on various aspects of analysis, including the performance of currently available read-mapping tools, gene and transcript quantification, and detection of allele-specific expression patterns. Our results indicate that, while the scalability of read-mapping tools and the cost-effectiveness of long read protocol is an issue that requires further attention, longer reads enable more accurate quantification of diverse aspects of gene expression, including individual-specific patterns of allele-specific expression and alternative splicing. Two RNA-Seq datasets of differing read lengths (2x262 bp and 2x75 bp)
Project description:In recent years, long-read sequencing technologies have detected transcript isoforms with unprecedented accuracy and resolution. However, it remains unclear whether long-read sequencing can effectively disentangle the isoform landscape of complex allele-specific loci that arise from genetic or epigenetic differences between alleles. Here, we combine the PacBio Iso-Seq workflow with the established phasing approach WhatsHap to assign long reads to the corresponding allele in polymorphic F1 mouse hybrids. Upon comparing the long-read sequencing results with matched short reads, we observed general consistency in the allele-specific information and were able to confirm the imprinting status of known imprinted genes. We then explored the complex imprinted Gnas locus known for allele-specific non-coding and coding isoforms and were able to benchmark historical observations. This approach also allowed us to detect isoforms from both the active and inactive X chromosomes of genes that escape X chromosome inactivation. The described workflow offers a promising framework and demonstrates the power of long-read transcriptomic data to provide mechanistic insight into complex allele-specific loci.
Project description:Transcription and translation are intertwined processes where mRNA isoforms are crucial intermediaries. However, methodological limitations in analyzing translation at the mRNA isoform level have impaired our ability to comprehensively establish links between the full-length transcripts and the translatome. This has left gaps in our understanding of critical biological processes, regulatory mechanisms, and disease progression. To address this, we develop an integrated computational and experimental framework called long-read Ribo-STAMP (LR-Ribo-STAMP). LR-Ribo-STAMP capitalizes on advancements in long-read sequencing and RNA-base editing-mediated technologies to simultaneously and scalably profile translation and transcription at both gene and mRNA isoform levels for the first time. In this report, we show agreement between gene-level translation profiles obtained with LR-Ribo-STAMP and those from previously validated short-read Ribo-STAMP data in unperturbed cells. At the mRNA isoform level, we show that LR-Ribo-STAMP successfully profiles translation in unperturbed cells and links mRNA isoforms and regulatory features, such as upstream ORFs (uORFs) and regulatory sequences, to translation measurements. We further demonstrate the method’s effectiveness in profiling disease models by profiling translation at gene and isoform levels in a triple-negative breast cancer cell line under normoxia and hypoxia. Here, we find that LR-Ribo-STAMP effectively delineates orthogonal transcriptional and translation shifts between conditions at gene and isoform levels. At the isoform level, LR-Ribo-STAMP uniquely identifies key regulatory elements and shifts in mRNA isoform transcription that correlate with changes in translational, providing an example of insight that can inform the development of novel therapeutics. Overall, LR-Ribo-STAMP is a significant advancement in translation methods and can have profound implications for basic research and clinical applications.
Project description:Ongoing improvements to next generation sequencing technologies are leading to longer sequencing read lengths, but a thorough understanding of the impact of longer reads on RNA sequencing analyses is lacking. To address this issue, we generated and compared two RNA sequencing datasets of differing read lengths -- 2x75 bp (L75) and 2x262 bp (L262) -- and investigated the impact of read length on various aspects of analysis, including the performance of currently available read-mapping tools, gene and transcript quantification, and detection of allele-specific expression patterns. Our results indicate that, while the scalability of read-mapping tools and the cost-effectiveness of long read protocol is an issue that requires further attention, longer reads enable more accurate quantification of diverse aspects of gene expression, including individual-specific patterns of allele-specific expression and alternative splicing.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.