Dataset Information

Disentangling transcription factor binding site complexity.

ABSTRACT: The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities via multiple weight matrices. However, both orthogonal approaches have limitations when learning from in vivo data where binding sites of other factors in close proximity can interfere with motif discovery for the protein of interest. In this work, we demonstrate how intra-motif complexity can, purely by analyzing the statistical properties of a given set of TF-binding sites, be distinguished from complexity arising from an intermix with motifs of co-binding TFs or other artifacts. In addition, we study the related question whether intra-motif complexity is represented more effectively by dependencies, heterogeneities or variants in between. Benchmarks demonstrate the effectiveness of both methods for their respective tasks and applications on motif discovery output from recent tools detect and correct many undesirable artifacts. These results further suggest that the prevalence of intra-motif dependencies may have been overestimated in previous studies on in vivo data and should thus be reassessed.

SUBMITTER: Eggeling R

PROVIDER: S-EPMC6237759 | biostudies-literature | 2018 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Disentangling transcription factor binding site complexity.

Eggeling Ralf R

Nucleic acids research 20181101 20

The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities via multiple weight matrices. However, both orthogonal approaches have limitations when learning from in vivo data where binding sites of other factors in close proximity can interfere with motif dis ...[more]

PMID: 30085218

Dataset Information

Disentangling transcription factor binding site complexity.

Publications

Disentangling transcription factor binding site complexity.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Varying levels of complexity in transcription factor binding motifs.
| S-EPMC4605289 | biostudies-literature

Dynamics of Transcription Factor Binding Site Evolution.
| S-EPMC4636380 | biostudies-literature

Subtypes of associated protein-DNA (Transcription Factor-Transcription Factor Binding Site) patterns.
| S-EPMC3479201 | biostudies-literature

Evolutionary origins of transcription factor binding site clusters.
| S-EPMC3278477 | biostudies-literature

Evaluating tools for transcription factor binding site prediction.
| S-EPMC6889335 | biostudies-literature

COTRASIF: conservation-aided transcription-factor-binding site finder.
| S-EPMC2673430 | biostudies-literature

Unraveling determinants of transcription factor binding outside the core binding site.
| S-EPMC4484385 | biostudies-literature

A web server for transcription factor binding site prediction.
| S-EPMC1891680 | biostudies-literature

The next generation of transcription factor binding site prediction.
| S-EPMC3764009 | biostudies-literature

Every Site Counts: Submitting Transcription Factor-Binding Site Information through the CollecTF Portal.
| S-EPMC4518829 | biostudies-literature