Dataset Information

Improved similarity scores for comparing motifs.

ABSTRACT:

Motivation

A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good alignment of two informative columns (say, all-A) and one of two uninformative columns. This observation explains why tools such as Tomtom occasionally return an alignment of uninformative columns which is clearly spurious. To address this problem, Habib et al. suggested a new score [Bayesian Likelihood 2-Component (BLiC)] which uses a Bayesian information criterion to penalize matches that are also similar to the background distribution.

Results

We show that the BLiC score exhibits other, highly undesirable properties, and we offer instead a general approach to adjust any motif similarity score so as to reduce the number of reported spurious alignments of uninformative columns. We implement our method in Tomtom and show that, without significantly compromising Tomtom's retrieval accuracy or its runtime, we can drastically reduce the number of uninformative alignments.

Availability and implementation

The modified Tomtom is available as part of the MEME Suite at http://meme.nbcr.net.

SUBMITTER: Tanaka E

PROVIDER: S-EPMC3106196 | biostudies-literature | 2011 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Improved similarity scores for comparing motifs.

Tanaka Emi E Bailey Timothy T Grant Charles E CE Noble William Stafford WS Keich Uri U

Bioinformatics (Oxford, England) 20110504 12

<h4>Motivation</h4>A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good alignment of two informative columns (say, all-A) and one of two uninformative columns. This observation explains why t ...[more]

PMID: 21543443

Dataset Information

Improved similarity scores for comparing motifs.

Motivation

Results

Availability and implementation

Publications

Improved similarity scores for comparing motifs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Quantifying similarity between motifs.
| S-EPMC1852410 | biostudies-literature

Determining protein similarity by comparing hydrophobic core structure.
| S-EPMC5300504 | biostudies-literature

ILNCSIM: improved lncRNA functional similarity calculation model.
| S-EPMC5041953 | biostudies-literature

Testing statistical significance scores of sequence comparison methods with structure similarity.
| S-EPMC1618413 | biostudies-literature

QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs.
| S-EPMC9403031 | biostudies-literature

Recalibration Methods for Improved Clinical Utility of Risk Scores.
| S-EPMC8977399 | biostudies-literature

Comparing the similarity of different groups of bacteria to the human proteome.
| S-EPMC3338800 | biostudies-literature

Comparing a Query Compound with Drug Target Classes Using 3D-Chemical Similarity.
| S-EPMC7352980 | biostudies-literature

Beyond scores: A machine learning approach to comparing educational system effectiveness.
| S-EPMC10602239 | biostudies-literature

Propensity Scores: Confounder Adjustment When Comparing Nonrandomized Groups in Orthopaedic Surgery.
| S-EPMC10023476 | biostudies-literature