Dataset Information

Bayesian Markov models improve the prediction of binding motifs beyond first order.

ABSTRACT: Transcription factors (TFs) regulate gene expression by binding to specific DNA motifs. Accurate models for predicting binding affinities are crucial for quantitatively understanding of transcriptional regulation. Motifs are commonly described by position weight matrices, which assume that each position contributes independently to the binding energy. Models that can learn dependencies between positions, for instance, induced by DNA structure preferences, have yielded markedly improved predictions for most TFs on in vivo data. However, they are more prone to overfit the data and to learn patterns merely correlated with rather than directly involved in TF binding. We present an improved, faster version of our Bayesian Markov model software, BaMMmotif2. We tested it with state-of-the-art motif discovery tools on a large collection of ChIP-seq and HT-SELEX datasets. BaMMmotif2 models of fifth-order achieved a median false-discovery-rate-averaged recall 13.6% and 12.2% higher than the next best tool on 427 ChIP-seq datasets and 164 HT-SELEX datasets, respectively, while being 8 to 1000 times faster. BaMMmotif2 models showed no signs of overtraining in cross-cell line and cross-platform tests, with similar improvements on the next-best tool. These results demonstrate that dependencies beyond first order clearly improve binding models for most TFs.

SUBMITTER: Ge W

PROVIDER: S-EPMC8057495 | biostudies-literature | 2021 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Bayesian Markov models improve the prediction of binding motifs beyond first order.

Ge Wanwan W Meier Markus M Roth Christian C Söding Johannes J

NAR genomics and bioinformatics 20210420 2

Transcription factors (TFs) regulate gene expression by binding to specific DNA motifs. Accurate models for predicting binding affinities are crucial for quantitatively understanding of transcriptional regulation. Motifs are commonly described by position weight matrices, which assume that each position contributes independently to the binding energy. Models that can learn dependencies between positions, for instance, induced by DNA structure preferences, have yielded markedly improved predictio ...[more]

PMID: 33928244

Dataset Information

Bayesian Markov models improve the prediction of binding motifs beyond first order.

Publications

Bayesian Markov models improve the prediction of binding motifs beyond first order.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.
| S-EPMC7203737 | biostudies-literature

Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.
| S-EPMC5291271 | biostudies-literature

Scalable Bayesian Inference for Coupled Hidden Markov and Semi-Markov Models.
| S-EPMC7455056 | biostudies-literature

Combining chains of Bayesian models with Markov melding.
| S-EPMC7614958 | biostudies-literature

Reflected generalized concentration addition and Bayesian hierarchical models to improve chemical mixture prediction.
| S-EPMC10977799 | biostudies-literature

Bayesian Hidden Markov Models for Dependent Large-Scale Multiple Testing.
| S-EPMC6818740 | biostudies-literature

Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites.
| S-EPMC2432075 | biostudies-literature

Bayesian hidden Markov models for delineating the pathology of Alzheimer's disease.
| S-EPMC5984196 | biostudies-literature

Using higher-order Markov models to reveal flow-based communities in networks.
| S-EPMC4814833 | biostudies-other

Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses.
| S-EPMC4764945 | biostudies-other