Dataset Information

Accurate isoform quantification by joint short- and long-read RNA-sequencing [long reads]

ABSTRACT: Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing methods using either short-read (SR) or long-read (LR) RNA sequencing have significant limitations: SR sequencing provides high depth but struggles with isoform deconvolution, while LR sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. Applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of mRNA abundance determinants, reveals the role of untranslated regions (UTRs) in isoform regulation through isoform-specific interactions with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.

ORGANISM(S): Homo sapiens

PROVIDER: GSE271527 | GEO | 2024/07/10

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Transcription and translation are intertwined processes where mRNA isoforms are crucial intermediaries. However, methodological limitations in analyzing translation at the mRNA isoform level have impaired our ability to comprehensively establish links between the full-length transcripts and the translatome. This has left gaps in our understanding of critical biological processes, regulatory mechanisms, and disease progression. To address this, we develop an integrated computational and experimental framework called long-read Ribo-STAMP (LR-Ribo-STAMP). LR-Ribo-STAMP capitalizes on advancements in long-read sequencing and RNA-base editing-mediated technologies to simultaneously and scalably profile translation and transcription at both gene and mRNA isoform levels for the first time. In this report, we show agreement between gene-level translation profiles obtained with LR-Ribo-STAMP and those from previously validated short-read Ribo-STAMP data in unperturbed cells. At the mRNA isoform level, we show that LR-Ribo-STAMP successfully profiles translation in unperturbed cells and links mRNA isoforms and regulatory features, such as upstream ORFs (uORFs) and regulatory sequences, to translation measurements. We further demonstrate the method’s effectiveness in profiling disease models by profiling translation at gene and isoform levels in a triple-negative breast cancer cell line under normoxia and hypoxia. Here, we find that LR-Ribo-STAMP effectively delineates orthogonal transcriptional and translation shifts between conditions at gene and isoform levels. At the isoform level, LR-Ribo-STAMP uniquely identifies key regulatory elements and shifts in mRNA isoform transcription that correlate with changes in translational, providing an example of insight that can inform the development of novel therapeutics. Overall, LR-Ribo-STAMP is a significant advancement in translation methods and can have profound implications for basic research and clinical applications.

Project description:Long-read sequencing has become a powerful tool for alternative splicing analysis. However, technical and computational challenges have limited our ability to couple long-read sequencing with single cell and spatial barcoding to explore alternative splicing in the single cell and spatial setting. Though Nanopore-based long reads sequencing are widelyhave been adopted applied to explore single cell alternative and spatially barcoded librariessplicing in recent research, there still exist technical issues have problems which could bias the hindered accurate single cell isoform-level quantification, which are not well addressed in such settings. First, Tthe relatively higher sequencing error of Nanopore long reads, despite the recent improvements, has limited the accuracy ofhinder cell barcode and unique molecular identifier (UMI) recovery, a necessary first step in the analysis of single cell/spatial sequencing data. Then Rread truncation and mapping errors, the latter exacerbated by the higher sequencing error rates, further leads to the false detection of spurious new isoformsdegrade quantification accuracy. We show that these technical issues persist despite the recent improvements in long read sequencing accuracy. Beyond the initial data pre-processing, in downstream analysis we are lacking a statistical framework to quantify splicing variation within and between cells/spots. In light of these multiple challenges, we developed Longcell, a statistical framework and computational pipeline for isoform quantification using single cell and spatial spot barcoded Nanopore long read sequencing data. Longcell performs computationally efficient cell/spot barcode extraction, UMI recovery, and UMI-based truncation- and mapping-error correction. Through a statistical model that accounts for varying read coverage across cells/spots, Longcell rigorously quantifies the level of inter-cell/spot versus intra-cell/ spot diversity in exon-usage and detects changes in splicing distributions between cell populations. Applying Longcell to single cell long-read data from multiple contexts, we found that intra-cell splicing heterogeneity, where multiple isoforms co-exist within the same cell, is ubiquitous for highly expressed genes. On matched single cell and Visium long read sequencing for a tissue of colorectal cancer metastasis to the liver, Longcell found concordant signals between the single cell and spatial data modalities. On Visium long read sequencing data for multiple tissues, Longcell allows accurate identification of spatial isoform switching. Finally, on a perturbation experiment for 9 splicing factors, Longcell identified regulatory targets that are validated by targeted sequencing.

Dataset Information

Accurate isoform quantification by joint short- and long-read RNA-sequencing [long reads]

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets