Unknown

Dataset Information

0

DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.


ABSTRACT: Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites.We describe a sequence?+?shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum?+?shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch?+?shape kernel performs better than the di-mismatch kernel for intermediate k values.The software is available at https://bitbucket.org/wenxiu/sequence-shape.git.rohs@usc.edu or william-noble@uw.edu.Supplementary data are available at Bioinformatics online.

SUBMITTER: Ma W 

PROVIDER: S-EPMC5870879 | biostudies-literature | 2017 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

Ma Wenxiu W   Yang Lin L   Rohs Remo R   Noble William Stafford WS  

Bioinformatics (Oxford, England) 20171001 19


<h4>Motivation</h4>Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of a  ...[more]

Similar Datasets

| S-EPMC4403198 | biostudies-literature
| S-EPMC2808352 | biostudies-literature
| S-EPMC4231734 | biostudies-literature
2014-11-04 | GSE59845 | GEO
2015-03-01 | GSE60200 | GEO
2014-09-04 | E-GEOD-61105 | biostudies-arrayexpress
2014-09-04 | GSE61105 | GEO
2014-11-04 | E-GEOD-59845 | biostudies-arrayexpress
2015-03-01 | E-GEOD-60200 | biostudies-arrayexpress
| S-EPMC2718632 | biostudies-literature