Dataset Information

Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators.

ABSTRACT: It is well-established that neural networks can predict or identify structural motifs of non-coding RNAs (ncRNAs). Yet, the neural network based identification of RNA structural motifs is limited by the availability of training data that are often insufficient for learning features of specific ncRNA families or structural motifs. Aiming to reliably identify intrinsic transcription terminators in bacteria, we introduce a novel pre-training approach that uses inverse folding to generate training data for predicting or identifying a specific family or structural motif of ncRNA. We assess the ability of neural networks to identify secondary structure by systematic in silico mutagenesis experiments. In a study to identify intrinsic transcription terminators as functionally well-understood RNA structural motifs, our inverse folding based pre-training approach significantly boosts the performance of neural network topologies, which outperform previous approaches to identify intrinsic transcription terminators. Inverse-folding based pre-training provides a simple, yet highly effective way to integrate the well-established thermodynamic energy model into deep neural networks for identifying ncRNA families or motifs. The pre-training technique is broadly applicable to a range of network topologies as well as different types of ncRNA families and motifs.

SUBMITTER: Brandenburg VB

PROVIDER: S-EPMC9262186 | biostudies-literature | 2022 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators.

Brandenburg Vivian B VB Narberhaus Franz F Mosig Axel A

PLoS computational biology 20220707 7

It is well-established that neural networks can predict or identify structural motifs of non-coding RNAs (ncRNAs). Yet, the neural network based identification of RNA structural motifs is limited by the availability of training data that are often insufficient for learning features of specific ncRNA families or structural motifs. Aiming to reliably identify intrinsic transcription terminators in bacteria, we introduce a novel pre-training approach that uses inverse folding to generate training d ...[more]

PMID: 35797361

Similar Datasets

Project description:Intrinsic terminators, which encode GC-rich RNA hairpins followed immediately by a 7-to-9-nucleotide (nt) U-rich "U-tract," play principal roles of punctuating and regulating transcription in most bacteria. However, canonical intrinsic terminators with strong U-tracts are underrepresented in some bacterial lineages, notably mycobacteria, leading to proposals that their RNA polymerases stop at noncanonical intrinsic terminators encoding various RNA structures lacking U-tracts. We generated recombinant forms of mycobacterial RNA polymerase and its major elongation factors NusA and NusG to characterize mycobacterial intrinsic termination. Using in vitro transcription assays devoid of possible mycobacterial contaminants, we established that mycobacterial RNA polymerase terminates more efficiently than Escherichia coli RNA polymerase at canonical terminators with imperfect U-tracts but does not terminate at putative terminators lacking U-tracts even in the presence of mycobacterial NusA and NusG. However, mycobacterial NusG exhibits a novel termination-stimulating activity that may allow intrinsic terminators with suboptimal U-tracts to function efficiently. IMPORTANCE Bacteria rely on transcription termination to define and regulate units of gene expression. In most bacteria, precise termination and much regulation by attenuation are accomplished by intrinsic terminators that encode GC-rich hairpins and U-tracts necessary to disrupt stable transcription elongation complexes. Thus, the apparent dearth of canonical intrinsic terminators with recognizable U-tracts in mycobacteria is of significant interest both because noncanonical intrinsic terminators could reveal novel routes to destabilize transcription complexes and because accurate understanding of termination is crucial for strategies to combat mycobacterial diseases and for computational bioinformatics generally. Our finding that mycobacterial RNA polymerase requires U-tracts for intrinsic termination, which can be aided by NusG, will guide future study of mycobacterial transcription and aid improvement of predictive algorithms to annotate bacterial genome sequences.

Dataset Information

Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators.

Publications

Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets