Project description:In recent years, long-read sequencing technologies have detected transcript isoforms with unprecedented accuracy and resolution. However, it remains unclear whether long-read sequencing can effectively disentangle the isoform landscape of complex allele-specific loci that arise from genetic or epigenetic differences between alleles. Here, we combine the PacBio Iso-Seq workflow with the established phasing approach WhatsHap to assign long reads to the corresponding allele in polymorphic F1 mouse hybrids. Upon comparing the long-read sequencing results with matched short reads, we observed general consistency in the allele-specific information and were able to confirm the imprinting status of known imprinted genes. We then explored the complex imprinted Gnas locus known for allele-specific non-coding and coding isoforms and were able to benchmark historical observations. This approach also allowed us to detect isoforms from both the active and inactive X chromosomes of genes that escape X chromosome inactivation. The described workflow offers a promising framework and demonstrates the power of long-read transcriptomic data to provide mechanistic insight into complex allele-specific loci.
Project description:Large-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5’ unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.
Project description:Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing methods using either short-read (SR) or long-read (LR) RNA sequencing have significant limitations: SR sequencing provides high depth but struggles with isoform deconvolution, while LR sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. Applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of mRNA abundance determinants, reveals the role of untranslated regions (UTRs) in isoform regulation through isoform-specific interactions with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.
Project description:Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing methods using either short-read (SR) or long-read (LR) RNA sequencing have significant limitations: SR sequencing provides high depth but struggles with isoform deconvolution, while LR sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. Applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of mRNA abundance determinants, reveals the role of untranslated regions (UTRs) in isoform regulation through isoform-specific interactions with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.
Project description:Over the past 15+ years, genetic studies have been notably successful in revealing the architecture of multiple psychiatric disorders, including schizophrenia. It is now widely acknowledged that schizophrenia is highly polygenic, consisting of both common variants of unknown significance, identified through genome-wide association studies (GWAS), and rare CNVs and coding variants. Despite these advances, the underlying mechanisms of specific genes implicated by GWAS are not well understood. Multiple lines of evidence implicate alternative splicing in the pathophysiology of schizophrenia. Single-nucleotide polymorphisms (SNPs) within implicated schizophrenia loci could alter isoform diversity and abundances which may not be reflected in a typical differential expression study. Hence, we generated a comprehensive isoform survey of postmortem human dorsolateral prefrontal cortex (DLPFC) from schizophrenia cases and neurotypical controls to identify case-control isoform-level differences. We developed an analysis pipeline that combines the strengths of PacBio SMRT long-read RNA sequencing in conducting a detailed isoform census with the capacity of short-read RNA sequencing for the quantification of isoform abundances. From several hundred thousand discovered long-read isoforms we curated a transcriptome with tens of thousands of high confidence novel isoforms. We then identified differential isoform usage (DIU) genes using a combination of established and in-house pipelines that enables case-control comparisons. Many of these novel isoforms are differentially expressed in schizophrenia DLPFC vs neurotypical controls. Differentially expressed genes are enriched in gene sets related to synaptic structure and function, RNA binding and splicing, as well as cell types previously implicated in schizophrenia, including cortical excitatory neurons, medium spiny neurons, and pyramidal CA1 neurons. Publicly available splicing data, genotyping, proteomics, and single nucleus sequencing results verify and support our results.
Project description:Over the past 15+ years, genetic studies have been notably successful in revealing the architecture of multiple psychiatric disorders, including schizophrenia. It is now widely acknowledged that schizophrenia is highly polygenic, consisting of both common variants of unknown significance, identified through genome-wide association studies (GWAS), and rare CNVs and coding variants. Despite these advances, the underlying mechanisms of specific genes implicated by GWAS are not well understood. Multiple lines of evidence implicate alternative splicing in the pathophysiology of schizophrenia. Single-nucleotide polymorphisms (SNPs) within implicated schizophrenia loci could alter isoform diversity and abundances which may not be reflected in a typical differential expression study. Hence, we generated a comprehensive isoform survey of postmortem human dorsolateral prefrontal cortex (DLPFC) from schizophrenia cases and neurotypical controls to identify case-control isoform-level differences. We developed an analysis pipeline that combines the strengths of PacBio SMRT long-read RNA sequencing in conducting a detailed isoform census with the capacity of short-read RNA sequencing for the quantification of isoform abundances. From several hundred thousand discovered long-read isoforms we curated a transcriptome with tens of thousands of high confidence novel isoforms. We then identified differential isoform usage (DIU) genes using a combination of established and in-house pipelines that enables case-control comparisons. Many of these novel isoforms are differentially expressed in schizophrenia DLPFC vs neurotypical controls. Differentially expressed genes are enriched in gene sets related to synaptic structure and function, RNA binding and splicing, as well as cell types previously implicated in schizophrenia, including cortical excitatory neurons, medium spiny neurons, and pyramidal CA1 neurons. Publicly available splicing data, genotyping, proteomics, and single nucleus sequencing results verify and support our results.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:The goal of this project was to perform long-read RNA sequencing (LR-seq, PacBio) in combination with short-read RNA-seq for systematic characterization of the isoform diversity in primary breast tumor samples. We sequenced the full-length transcriptomes of 26 breast tumors and 4 normal breast samples.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.