Unknown

Dataset Information

0

PolyaPeak: detecting transcription factor binding sites from ChIP-seq using peak shape information.


ABSTRACT: ChIP-seq is a powerful technology for detecting genomic regions where a protein of interest interacts with DNA. ChIP-seq data for mapping transcription factor binding sites (TFBSs) have a characteristic pattern: around each binding site, sequence reads aligned to the forward and reverse strands of the reference genome form two separate peaks shifted away from each other, and the true binding site is located in between these two peaks. While it has been shown previously that the accuracy and resolution of binding site detection can be improved by modeling the pattern, efficient methods are unavailable to fully utilize that information in TFBS detection procedure. We present PolyaPeak, a new method to improve TFBS detection by incorporating the peak shape information. PolyaPeak describes peak shapes using a flexible Pólya model. The shapes are automatically learnt from the data using Minorization-Maximization (MM) algorithm, then integrated with the read count information via a hierarchical model to distinguish true binding sites from background noises. Extensive real data analyses show that PolyaPeak is capable of robustly improving TFBS detection compared with existing methods. An R package is freely available.

SUBMITTER: Wu H 

PROVIDER: S-EPMC3946423 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

altmetric image

Publications

PolyaPeak: detecting transcription factor binding sites from ChIP-seq using peak shape information.

Wu Hao H   Ji Hongkai H  

PloS one 20140307 3


ChIP-seq is a powerful technology for detecting genomic regions where a protein of interest interacts with DNA. ChIP-seq data for mapping transcription factor binding sites (TFBSs) have a characteristic pattern: around each binding site, sequence reads aligned to the forward and reverse strands of the reference genome form two separate peaks shifted away from each other, and the true binding site is located in between these two peaks. While it has been shown previously that the accuracy and reso  ...[more]

Similar Datasets

| S-EPMC2804666 | biostudies-literature
| S-EPMC3287483 | biostudies-literature
| S-EPMC3245948 | biostudies-literature
| S-EPMC4046686 | biostudies-literature
| S-EPMC3032669 | biostudies-literature
| S-EPMC2917543 | biostudies-literature
| S-EPMC3799470 | biostudies-literature
| S-EPMC2853110 | biostudies-literature
| S-EPMC4413818 | biostudies-literature
| S-EPMC4618392 | biostudies-literature