Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A long-read RNA-seq approach to identify novel transcripts of very large genes

ABSTRACT: RNA-seq is widely used for studying gene expression, but commonly used sequencing platforms produce short reads that only span up to two exon-junctions per read. This makes it difficult to accurately determine the composition and phasing of exons within transcripts. Although long-read sequencing improves this issue, it is not amenable to precise quantitation, which limits its utility for differential expression studies. We used long-read isoform sequencing combined with a novel analysis approach to compare alternative splicing of large, repetitive structural genes in muscles. Analysis of muscle structural genes that produce medium (Nrap - 5kb), large (Nebulin - 22 kb) and very-large (Titin - 106 kb) transcripts in cardiac muscle, and fast and slow skeletal muscles identified unannotated exons for each of these ubiquitous muscle genes. This also identified differential exon usage and phasing for these genes between the different muscle types. By mapping the in-phase transcript structures to known annotations, we also identified and quantified previously unannotated transcripts. Results were confirmed by endpoint PCR and Sanger sequencing, which revealed muscle-type specific differential expression of these novel transcripts. The improved transcript identification and quantification demonstrated by our approach removes previous impediments to studies aimed at quantitative differential expression of ultra-long transcripts.

ORGANISM(S): Mus musculus

PROVIDER: GSE138362 | GEO | 2020/05/17

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Using long-read CAGE sequencing to profile cryptic-promoter derived transcripts and their contribution to the immunopeptidome

Project description:Recent studies have demonstrated that the non-coding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (Long-Read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE-derived and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we demonstrated that long-read technology significantly improves mapping of promoters with low mappability scores and LRCAGE guarantees accurate construction of uncharacterized 5’ transcript structure. Unannotated peptides predicted from newly characterized transcripts were readily detectable in whole cell lysate mass-spectrometry data. Incorporating unannotated peptides into the proteome database enabled us to detect non-canonical antigens in HLA-pulldown LC-MS/MS data. At last, we showed that epigenetic treatment increased the number of non-canonical antigens, particularly those encoded by TE-derived transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.

2023-09-23 | PXD040265 | Pride

Project description:KIF1A long read phasing

| PRJNA1244324 | ENA

Dynamic transcriptomes during neural differentiation of human embryonic stem cells

Project description:Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short long, and paired-end sequencing In order to examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of human embryonic stem cells (hESCs) into the neural lineage. Undifferentiated hESCs as well as cells at three stages of early neural differentiation, N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like) were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call “isoform specialization.” During neural differentiation, we observed differential expression of many types of genes including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, NPC identity maintenance and the transition from a predominantly neuronal state into one with increased gliogenic potential

2010-03-04 | GSE20301 | GEO

Merging short and stranded long reads improves transcript assembly

Project description:New tools for improved long-read transcript assembly and coalescence with its short-read counterpart are required. Using our short- and long-read measurements from different cell lines with spiked-in standards, we systematically compared key parameters and biases in the read alignment and assembly of transcripts. We report a cDNA synthesis artifact in long-read datasets that impacts the identity and quantitation of assembled transcripts. We developed a computational pipeline to strand long-read cDNA libraries that markedly improves assembly of transcripts from long-reads. Incorporating stranded long-reads in a new hybrid assembly approach, we demonstrate its efficacy for improved characterization of challenging lncRNA transcripts. Our workflow can be applied to a wide range of transcriptomics datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.

2023-10-14 | GSE215357 | GEO

Merging short and stranded long reads improves transcript assembly

2023-10-14 | GSE215355 | GEO

Project description:Long read sequencing of RHOH mRNA transcripts in B-cells reveals new exons and splicing patterns.

| PRJNA690664 | ENA

Multi-omic profiling of pathogen-stimulated primary immune cells

Project description:Objectives: To perform long-read transcriptome and proteome profiling of pathogen-stimulated peripheral blood mononuclear cells (PBMCs) from healthy donors. We aim to discover new transcripts and protein isoforms expressed during immune responses to diverse pathogens. Methods: PBMCs were exposed to four microbial stimuli for 24 hours: the TLR4 ligand lipopolysaccharide (LPS), the TLR3 ligand Poly(I:C), heat-inactivated Staphylococcus aureus, Candida albicans, and RPMI medium as negative controls. Long-read sequencing (PacBio) of one donor and secretome proteomics and short-read sequencing of five donors were performed. IsoQuant was used for transcriptome construction, Metamorpheus/FlashLFQ for proteome analysis, and Illumina short-read 3’-end mRNA sequencing for transcript quantification. Results: Long-read transcriptome profiling reveals the expression of novel sequences and isoform switching induced upon pathogen stimulation, including transcripts that are difficult to detect using traditional short-read sequencing. We observe widespread loss of intron retention as a common result of all pathogen stimulations. We highlight novel transcripts of NFKB1 and CASP1 that may indicate novel immunological mechanisms. In general, RNA expression differences did not result in differences in the amounts of secreted proteins. Interindividual differences in the proteome were larger than the differences between stimulated and unstimulated PBMCs. Clustering analysis of secreted proteins revealed a correlation between chemokine (receptor) expression on the RNA and protein levels in C. albicans- and Poly(I:C)-stimulated PBMCs. Conclusion: Isoform aware long-read sequencing of pathogen-stimulated immune cells highlights the potential of these methods to identify novel transcripts, revealing a more complex transcriptome landscape than previously appreciated.

2023-09-16 | PXD045237 | Pride

Integrated detection and quantification of aberrant transcripts with novel splicing events

Project description:Splicing misregulation, such as the inclusion of previously unknown cryptic exons, is implicated in numerous diseases. Recent methods have increased accurate and efficient detection of such splicing alterations occurring in disease phenotypes. However, the quantification and differential analyses of non-canonical splicing alterations remains focused at a splice event level, thus preventing a complete view of the effects on the downstream transcriptomic landscape. Here, we present a novel and integrated pipeline, SpliCeAT, that (1) detects and quantifies differential non-canonical splicing events from short-read bulk RNA-seq data, (2) augments the canonical transcriptome with novel transcripts containing these non-canonical splicing events, and (3) performs transcript-level differential analysis to identify and quantify aberrant cryptic exon-containing transcripts based on this augmented transcriptome. Using TDP-43, an ALS/FTD-associated RNA-binding protein as an example, we identified and catalogued aberrant splicing events in embryonic mouse brains. The accuracy of our integrated pipeline was further confirmed and validated with long-read isoform sequencing. Furthermore, by comparing neuronal TDP-43 knockouts in mice with a publicly available human dataset with TDP-43 pathology, we identified and validated 4 common genes, namely, Kalrn/KALRN, Poldip3/POLDIP3, Rnf144a/RNF144A, and Unc13a/UNC13A, with cryptic exons. In summary, our integrated pipeline, novel splice events are identified, incorporated and quantified at the transcript level, thereby enabling more complete transcriptome profiling of well-annotated genomes in in the case of pathological splicing misregulation.

2025-02-02 | GSE288457 | GEO

A long-read RNA-seq approach to identify novel transcripts of very large genes

Project description:A long-read RNA-seq approach to identify novel transcripts of very large genes

| PRJNA575604 | ENA

Deep sequencing of the Caenorhabditis elegans transcriptome using RNA isolated from various developmental stages under various experimental conditions RW0001

Project description:The goal of this study, started as a part of the modENCODE project, is to detect and characterize previously unannotated transcripts of the C. elegans genome. This dataset has been imported from the Sequence Read Archive and curated by the WormBase and ArrayExpress teams.

2010-02-26 | E-MTAB-2683 | biostudies-arrayexpress

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data