Unknown

Dataset Information

0

Dipwmsearch: a Python package for searching di-PWM motifs.


ABSTRACT:

Motivation

Seeking probabilistic motifs in a sequence is a common task to annotate putative transcription factor binding sites or other RNA/DNA binding sites. Useful motif representations include position weight matrices (PWMs), dinucleotide PWMs (di-PWMs), and hidden Markov models (HMMs). Dinucleotide PWMs not only combine the simplicity of PWMs-a matrix form and a cumulative scoring function-but also incorporate dependency between adjacent positions in the motif (unlike PWMs which disregard any dependency). For instance to represent binding sites, the HOCOMOCO database provides di-PWM motifs derived from experimental data. Currently, two programs, SPRy-SARUS and MOODS, can search for occurrences of di-PWMs in sequences.

Results

We propose a Python package called dipwmsearch, which provides an original and efficient algorithm for this task (it first enumerates matching words for the di-PWM, and then searches these all at once in the sequence, even if the latter contains IUPAC codes). The user benefits from an easy installation via Pypi or conda, a comprehensive documentation, and executable scripts that facilitate the use of di-PWMs.

Availability and implementation

dipwmsearch is available at https://pypi.org/project/dipwmsearch/ and https://gite.lirmm.fr/rivals/dipwmsearch/ under Cecill license.

SUBMITTER: Mille M 

PROVIDER: S-EPMC10081870 | biostudies-literature | 2023 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

dipwmsearch: a Python package for searching di-PWM motifs.

Mille Marie M   Ripoll Julie J   Cazaux Bastien B   Rivals Eric E  

Bioinformatics (Oxford, England) 20230401 4


<h4>Motivation</h4>Seeking probabilistic motifs in a sequence is a common task to annotate putative transcription factor binding sites or other RNA/DNA binding sites. Useful motif representations include position weight matrices (PWMs), dinucleotide PWMs (di-PWMs), and hidden Markov models (HMMs). Dinucleotide PWMs not only combine the simplicity of PWMs-a matrix form and a cumulative scoring function-but also incorporate dependency between adjacent positions in the motif (unlike PWMs which disr  ...[more]

Similar Datasets

| S-EPMC10085746 | biostudies-literature
| S-EPMC10385924 | biostudies-literature
| S-EPMC9810194 | biostudies-literature
| S-EPMC7597035 | biostudies-literature
| S-EPMC8138882 | biostudies-literature
| S-EPMC8275978 | biostudies-literature
| S-EPMC10997433 | biostudies-literature
| S-EPMC4837986 | biostudies-literature
| S-EPMC8168212 | biostudies-literature
| S-EPMC9692103 | biostudies-literature