Unknown

Dataset Information

0

A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.


ABSTRACT:

Motivation

The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('shape') is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites.

Results

We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM+shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs.

Availability and implementation

The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/.

Contact

stephen.ramsey@oregonstate.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Yang J 

PROVIDER: S-EPMC4838056 | biostudies-literature | 2015 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.

Yang Jichen J   Ramsey Stephen A SA  

Bioinformatics (Oxford, England) 20150630 21


<h4>Motivation</h4>The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('s  ...[more]

Similar Datasets

| S-EPMC3208542 | biostudies-literature
| S-EPMC2842295 | biostudies-literature
| S-EPMC3481455 | biostudies-literature
| S-EPMC3166302 | biostudies-literature
| S-EPMC5902669 | biostudies-literature
| S-EPMC6827144 | biostudies-literature
| S-EPMC7068855 | biostudies-literature
| S-EPMC130017 | biostudies-literature
| S-EPMC4021615 | biostudies-literature
| S-EPMC6041753 | biostudies-literature