Other

Dataset Information

0

DMS-MaPseq chemical probing of pri-miRNA and human mRNA segments


ABSTRACT: Understanding macromolecular structures, such as proteins and nucleic acids, is critical for discerning their functions and biological roles. Advanced techniques like crystallography, NMR, and CryoEM have facilitated the determination of over 180,000 protein structures, all cataloged in the Protein Data Bank (PDB). This comprehensive repository has been pivotal in developing deep learning algorithms for predicting protein structures directly from sequences. In contrast, RNA structure prediction has lagged, primarily due to a scarcity of RNA structural data. Here, we present the secondary structures of 1098 pre-miRNAs and 1456 human mRNA regions determined through chemical probing. We develop a novel deep learning architecture, inspired from the Evoformer model of Alphafold and traditional architectures for secondary structure prediction. This new model, called eFold, was trained on our newly created database and over 100,000 secondary structures across multiple sources. We benchmark eFold on a set of challenging RNA structures and show that both our new architecture and dataset contributes to increasing the prediction performance, and outperforming similar end-to-end methods.This result reveals that merely expanding the database size is inadequate; rather, incorporating a greater diversity and complexity of structures is crucial for enhancing performance.

ORGANISM(S): synthetic construct

PROVIDER: GSE262014 | GEO | 2025/03/05

REPOSITORIES: GEO

Dataset's files

Source:
Action DRS
Other
Items per page:
1 - 1 of 1

Similar Datasets

2024-10-22 | GSE280041 | GEO
2023-05-10 | BIOMD0000001071 | BioModels
2023-02-22 | PXD036833 | JPOST Repository
| PRJNA795248 | ENA
| 46649 | ecrin-mdr-crc
2024-08-31 | GSE246859 | GEO
2010-05-19 | E-GEOD-15370 | biostudies-arrayexpress
| PRJEB56211 | ENA
2016-05-03 | E-MTAB-4012 | biostudies-arrayexpress
2024-01-09 | GSE250290 | GEO