Unknown

Dataset Information

0

Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest.


ABSTRACT: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model.We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar.Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific.

SUBMITTER: Wang X 

PROVIDER: S-EPMC5780765 | biostudies-literature | 2018 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest.

Wang Xin X   Lin Peijie P   Ho Joshua W K JWK  

BMC genomics 20180119 Suppl 1


<h4>Background</h4>It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possibl  ...[more]

Similar Datasets

| S-EPMC5123866 | biostudies-literature
| S-EPMC4642938 | biostudies-literature
| S-EPMC7077988 | biostudies-literature
| S-EPMC4778349 | biostudies-literature
| S-EPMC4653392 | biostudies-literature
| S-EPMC10441439 | biostudies-literature
| S-EPMC7058150 | biostudies-literature
| S-EPMC2640218 | biostudies-literature
| S-EPMC3142959 | biostudies-literature
| S-EPMC2768654 | biostudies-literature