Unknown

Dataset Information

0

LinearSampling: Linear-Time Stochastic Sampling of RNA Secondary Structure with Applications to SARS-CoV-2.


ABSTRACT: Many RNAs fold into multiple structures at equilibrium. The classical stochastic sampling algorithm can sample secondary structures according to their probabilities in the Boltzmann ensemble, and is widely used, e.g., for accessibility prediction. However, the current sampling algorithm, consisting of a bottom-up partition function phase followed by a top-down sampling phase, suffers from three limitations: (a) the formulation and implementation of the sampling phase are unnecessarily complicated; (b) much redundant work is repeatedly performed in the sampling phase; (c) the partition function runtime scales cubically with the sequence length. These issues prevent it from being used for full-length viral genomes such as SARS-CoV-2. To address these problems, we first present a hypergraph framework under which the sampling algorithm can be greatly simplified. We then present three sampling algorithms under this framework of which two eliminate redundant work in the sampling phase. Finally, we present LinearSampling, an end-to-end linear-time sampling algorithm that is orders of magnitude faster than the standard algorithm. For instance, LinearSampling is 111 times faster (48s vs. 1.5h) than Vienna RNAsubopt on the longest sequence in the RNAcentral dataset that RNAsubopt can run (15,780 nt). More importantly, LinearSampling is the first sampling algorithm to scale to the full genome of SARS-CoV-2, taking only 96 seconds on its reference sequence (29,903 nt). It finds 23 regions of 15 nt with high accessibilities, which can be potentially used for COVID-19 diagnostics and drug design.

SUBMITTER: Zhang H 

PROVIDER: S-EPMC7781300 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC9881153 | biostudies-literature
| S-EPMC8719904 | biostudies-literature
| S-EPMC8609897 | biostudies-literature
| S-EPMC7927282 | biostudies-literature
| S-EPMC8878378 | biostudies-literature
| S-EPMC10570024 | biostudies-literature
| S-EPMC7217285 | biostudies-literature
| S-EPMC169020 | biostudies-literature
| S-EPMC297010 | biostudies-literature
| S-EPMC9344895 | biostudies-literature