Dataset Information

Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites.

ABSTRACT: Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI-a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, number of instances and their locations solely on the basis of primary nucleotide sequence data. Furthermore, should biologically meaningful information about motif attributes be available, BAMBI takes advantage of this knowledge to further refine the discovery results. In practical applications, we show that the proposed approach can be used to find sites of such diverse DNA-binding molecules as the cAMP receptor protein (CRP) and Din-family site-specific serine recombinases. Results obtained by BAMBI in these and other settings demonstrate better statistical performance than any of the four widely-used profile-based motif discovery methods: MEME, BioProspector with BioOptimizer, SeSiMCMC and Motif Sampler as measured by the nucleotide-level correlation coefficient. Additionally, in the case of Din-family recombinase target site discovery, the BAMBI-inferred motif is found to be the only one functionally accurate from the underlying biochemical mechanism standpoint. C++ and Matlab code is available at http://www.ee.columbia.edu/~guido/BAMBI or http://genomics.lbl.gov/BAMBI/.

SUBMITTER: Jajamovich GH

PROVIDER: S-EPMC3241671 | biostudies-literature | 2011 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites.

Jajamovich Guido H GH Wang Xiaodong X Arkin Adam P AP Samoilov Michael S MS

Nucleic acids research 20110924 21

Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI-a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, n ...[more]

PMID: 21948794

Dataset Information

Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites.

Publications

Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Bayesian multiple instance regression for modeling immunogenic neoantigens.
| S-EPMC8009201 | biostudies-literature

Bayesian centroid estimation for motif discovery.
| S-EPMC3855595 | biostudies-literature

Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms.
| S-EPMC448439 | biostudies-other

Motif discovery and transcription factor binding sites before and after the next-generation sequencing era.
| S-EPMC3603212 | biostudies-literature

Bayesian inference of hub nodes across multiple networks.
| S-EPMC6393214 | biostudies-literature

Bayesian inference and comparison of stochastic transcription elongation models.
| S-EPMC7046298 | biostudies-literature

MBSTAR: multiple instance learning for predicting specific functional binding sites in microRNA targets.
| S-EPMC4648438 | biostudies-literature

Bayesian inference on stochastic gene transcription from flow cytometry data.
| S-EPMC6129284 | biostudies-literature

Correction: Bayesian inference and comparison of stochastic transcription elongation models.
| S-EPMC8357164 | biostudies-literature

Multiple Instance Neuroimage Transformer.
| S-EPMC9629332 | biostudies-literature