Unknown

Dataset Information

0

A new exhaustive method and strategy for finding motifs in ChIP-enriched regions.


ABSTRACT: ChIP-seq, which combines chromatin immunoprecipitation (ChIP) with next-generation parallel sequencing, allows for the genome-wide identification of protein-DNA interactions. This technology poses new challenges for the development of novel motif-finding algorithms and methods for determining exact protein-DNA binding sites from ChIP-enriched sequencing data. State-of-the-art heuristic, exhaustive search algorithms have limited application for the identification of short (l, d) motifs (l ? 10, d ? 2) contained in ChIP-enriched regions. In this work we have developed a more powerful exhaustive method (FMotif) for finding long (l, d) motifs in DNA sequences. In conjunction with our method, we have adopted a simple ChIP-enriched sampling strategy for finding these motifs in large-scale ChIP-enriched regions. Empirical studies on synthetic samples and applications using several ChIP data sets including 16 TF (transcription factor) ChIP-seq data sets and five TF ChIP-exo data sets have demonstrated that our proposed method is capable of finding these motifs with high efficiency and accuracy. The source code for FMotif is available at http://211.71.76.45/FMotif/.

SUBMITTER: Jia C 

PROVIDER: S-EPMC3901781 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

A new exhaustive method and strategy for finding motifs in ChIP-enriched regions.

Jia Caiyan C   Carson Matthew B MB   Wang Yang Y   Lin Youfang Y   Lu Hui H  

PloS one 20140124 1


ChIP-seq, which combines chromatin immunoprecipitation (ChIP) with next-generation parallel sequencing, allows for the genome-wide identification of protein-DNA interactions. This technology poses new challenges for the development of novel motif-finding algorithms and methods for determining exact protein-DNA binding sites from ChIP-enriched sequencing data. State-of-the-art heuristic, exhaustive search algorithms have limited application for the identification of short (l, d) motifs (l ≤ 10, d  ...[more]

Similar Datasets

| S-EPMC2919736 | biostudies-literature
| S-EPMC1803799 | biostudies-literature
| S-EPMC4022013 | biostudies-literature
| S-EPMC2912305 | biostudies-literature
| S-EPMC3532365 | biostudies-literature
| S-EPMC2762409 | biostudies-literature
| S-EPMC6748772 | biostudies-literature
| S-EPMC519121 | biostudies-literature
| S-EPMC4234855 | biostudies-literature
| S-EPMC2651809 | biostudies-literature