Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A long context RNA foundation model for predicting transcriptome architecture

ABSTRACT: Linking DNA sequence to genomic function remains one of the grand challenges in genetics and genomics. Here, we combine large-scale single-molecule transcriptome sequencing of diverse cancer cell lines with cutting-edge machine learning to build LoRNASH, an RNA foundation model that learns how the nucleotide sequence of unspliced pre-mRNA dictates transcriptome architecture—the relative abundances and molecular structures of mRNA isoforms. Owing to its use of the StripedHyena architecture, LoRNASH handles extremely long sequence inputs at base-pair resolution (~65 kilobase pairs), allowing for quantitative, zero-shot prediction of all aspects of transcriptome architecture, including isoform abundance, isoform structure, and the impact of DNA sequence variants on transcript structure and abundance. We anticipate that our public data release and the accompanying frontier model will accelerate many aspects of RNA biotechnology. More broadly, we envision the use of LoRNASH as a foundation for fine-tuning of any transcriptome-related downstream prediction task, including cell-type specific gene expression, splicing, and general RNA processing.

ORGANISM(S): Homo sapiens

PROVIDER: GSE280041 | GEO | 2024/10/22

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

RNA-Seq of mouse strains

Project description:The Mouse Genomes Project ( http://www.sanger.ac.uk/science/data/mouse-genomes-project ) uses using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. Access to complete sequence of multiple inbred strains will add to these resources and will become a permanent foundation for a systems biology approach to phenotypic variation in the mouse. In this particular study, we have sequenced the transcriptome of whole-brain tissue from 16 laboratory mouse strains to examine differences in gene expression levels, differential RNA-editing, and for use in de novo gene prediction.

2011-03-29 | E-MTAB-615 | biostudies-arrayexpress

RNA-Seq of different TadA variants to validate RNA Off-Targets

Project description:Mutation effects prediction is a fundamental challenge in biotechnology and biomedicine. State-of-the-art computational methods have demonstrated the benefits of including semantically rich representations learned from protein sequences, but leave structural constraints out of reach. Here we developed Protein Mutational Effect Predictor (ProMEP), a general and multimodal deep representation learning method that simultaneously learns sequence context and structural constraints from proteins at the scale of evolution. ProMEP markedly outperforms current leading methods and enables accurate zero-shot mutational effects prediction across a variety of deep mutational scanning experiments. The application of ProMEP in the transposon-associated TnpB enzyme engineering task further demonstrates its ability for high-throughput protein space exploration. Without prior knowledge of TnpB, ProMEP accurately identifies multiple mutations that significantly improve the editing efficiency from millions of variants.

2024-03-13 | GSE261254 | GEO

A Streamlined Tethered Chromosome Conformation Capture Protocol

Project description:Background: Identification of locus-locus contacts at the chromatin level provides a valuable foundation for understanding of nuclear architecture and function and a valuable tool for inferring long-range linkage relationships. As one approach to this, chromatin conformation capture-based techniques allow creation of genome spatial organization maps. While such approaches have been available for some time, methodological advances will be of considerable use in minimizing both time and input material required for successful application. Results: Here we report a modified tethered conformation capture protocol that utilizes a series of rapid and efficient molecular manipulations. We applied the method to Caenorhabditis elegans, obtaining chromatin interaction maps that provide a sequence-anchored delineation of salient aspects of Caenorhabditis elegans chromosome structure, demonstrating a high level of consistency in overall chromosome organization between biological samples collected under different conditions. In addition to the application of the method to defining nuclear architecture, we found the resulting chromatin interaction maps to be of sufficient resolution and sensitivity to enable detection of large-scale structural variants such as inversions or translocations. Conclusion: Our streamlined protocol provides an accelerated, robust, and broadly applicable means of generating chromatin spatial organization maps and detecting genome rearrangements without a need for cellular or chromatin fractionation. Application of modified version of TCC protocl using different C. elegans strains (N2 and glp-1) in L1, and adult life stages.

2016-01-20 | E-GEOD-76930 | biostudies-arrayexpress

A long context RNA foundation model for predicting transcriptome architecture

Project description:A long context RNA foundation model for predicting transcriptome architecture

| PRJNA1176011 | ENA

Reconstruction of 3-dimensional tissue organization at the single-cell resolution [ST]

Project description:We designed a neural network-based computational method that learns transcriptome-to-space mapping and reconstructs 3D tissue organization by learning from scRNA-seq and spatial transcriptomic data.

2023-08-08 | GSE220572 | GEO

Reconstruction of 3-dimensional tissue organization at the single-cell resolution [scRNA-seq]

2023-08-08 | GSE220571 | GEO

Pervasive 3'-UTR isoform switches during mouse oocyte maturation

Project description:Oocyte maturation is the foundation for developing healthy individuals of mammals. Upon germinal vesicle breakdown, oocyte meiosis resumes and the synthesis of new transcripts ceases. To quantitatively profile the transcriptomic dynamics after meiotic resumption throughout the oocyte maturation, we generated transcriptome sequencing data with individual mouse oocytes at three main developmental stages: germinal vesicle (GV), metaphase I (MI), and metaphase II (MII). When clustering the sequenced oocytes, results showed that isoform-level expression analysis outperformed gene-level analysis, indicating isoform expression provided extra information that was useful in distinguishing oocyte stages. Comparing transcriptomes of the oocytes at the GV stage and the MII stage, in addition to identification of differentially expressed genes (DEGs), we detected many differentially expressed transcripts (DETs), some of which came from genes that were not identified as DEGs. When breaking down the isoform-level changes into alternative RNA processing events, we found the main source of isoform composition changes was the alternative usage of polyadenylation sites. With detailed analysis focusing on the alternative usage of 3'-UTR isoforms, we identified, out of 3810 tested genes, 512 (13.7%) exhibiting significant switches of 3'-UTR isoforms during the process of moues oocyte maturation. Altogether, our data and analyses suggest the importance of examining isoform abundance changes during oocyte maturation, and further investigation of the pervasive 3'-UTR isoform switches in the transition may deepen our understanding on the molecular mechanisms underlying mammalian early development.

2021-10-05 | GSE178836 | GEO

Spatial transcriptomics dataset of primary tumours from MDA-MB-231 xenograft model

Project description:Identifying functionally important cell states and structure within heterogeneous tumors remains a significant biological and computational challenge. Current clustering or trajectory-based models are ill-equipped to address the notion that cancer cells reside along a phenotypic continuum. We present Archetypal Analysis network (AAnet), a neural network that learns archetypal states within a phenotypic continuum in single-cell data. Unlike traditional archetypal analysis, AAnet learns archetypes in simplex-shaped neural network latent space. Using pre-clinical models and clinical breast cancers, AAnet resolves distinct cell states and processes, including cell proliferation, hypoxia, metabolism and immune interactions. Primary tumor archetypes are recapitulated in matched liver, lung and lymph node metastases. The dataset here comprises of the 10X genomics based spatial transcriptomics (Visium) on MDA-MB-231 xenografts to perform archetypal analysis and understand spatial aspects of tumour hetegeneity. scRNAseq datasets from matched models and metastasis is projected onto this spatial transcriptomics data to understand the spatial dependencies and characteriostics of these archetypes in the manuscript.

2025-06-24 | GSE300613 | GEO

Multi-scale classification decodes the complexity of the human E3 ligome

Project description:E3 ubiquitin ligases are key regulators of protein homeostasis, targeting specific proteins for degradation via the ubiquitin-proteasome system (UPS). They provide crucial substrate specificity, making them promising candidates for the design of novel therapeutics. This work presents a comprehensive, annotated dataset of high-confidence catalytic human E3 ligases, termed the “E3 ligome”. Integrating disparate data from various granularity layers, including protein sequence, domain architecture, 3D structure, function, localization, and expression, we learn an emergent distance metric, capturing authentic relationships within this heterogeneous group. A weakly-supervised hierarchical classification framework identifies conserved features of E3 families and subfamilies, consistent with RING, HECT, and RBR classes. This classification explains functional segregation, identifies multi-subunit and standalone enzymes, and integrates substrate and small molecule interaction networks. Our analysis provides a global view of E3 biology, opening new strategies for drugging E3-substrate networks, including drug re-purposing and designing new E3 handles.

2025-11-18 | PXD067015 | Pride

A Streamlined Tethered Chromosome Conformation Capture Protocol

2016-01-20 | GSE76930 | GEO