Dataset Information

HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data.

ABSTRACT: Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method.Here we introduce HPeak, a Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage.Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at http://www.sph.umich.edu/csg/qin/HPeak.

SUBMITTER: Qin ZS

PROVIDER: S-EPMC2912305 | biostudies-literature | 2010 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data.

Qin Zhaohui S ZS Yu Jianjun J Shen Jincheng J Maher Christopher A CA Hu Ming M Kalyana-Sundaram Shanker S Yu Jindan J Chinnaiyan Arul M AM

BMC bioinformatics 20100702

<h4>Background</h4>Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associat ...[more]

PMID: 20598134

Dataset Information

HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data.

Publications

HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

vi-HMM: a novel HMM-based method for sequence variant identification in short-read data.
| S-EPMC6387560 | biostudies-literature

LOcating non-unique matched tags (LONUT) to improve the detection of the enriched regions for ChIP-seq data.
| S-EPMC3692479 | biostudies-literature

Parameter estimation for robust HMM analysis of ChIP-chip data.
| S-EPMC2536674 | biostudies-literature

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
| S-EPMC3136429 | biostudies-literature

ChIP-PaM: an algorithm to identify protein-DNA interaction using ChIP-Seq data.
| S-EPMC2893127 | biostudies-literature

Computational identification of cell-specific variable regions in ChIP-seq data.
| S-EPMC7229859 | biostudies-literature

CNV-guided multi-read allocation for ChIP-seq.
| S-EPMC4184254 | biostudies-literature

dsRID: in silico identification of dsRNA regions using long-read RNA-seq data.
| S-EPMC10628436 | biostudies-literature

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data.
| S-EPMC2732366 | biostudies-literature

MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework.
| S-EPMC4234855 | biostudies-literature