Dataset Information

A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs.

ABSTRACT: Chromatin immunoprecipitation (ChIP) followed by high throughput sequencing (ChIP-seq) is rapidly becoming the method of choice for discovering cell-specific transcription factor binding locations genome wide. By aligning sequenced tags to the genome, binding locations appear as peaks in the tag profile. Several programs have been designed to identify such peaks, but program evaluation has been difficult due to the lack of benchmark data sets. We have created benchmark data sets for three transcription factors by manually evaluating a selection of potential binding regions that cover typical variation in peak size and appearance. Performance of five programs on this benchmark showed, first, that external control or background data was essential to limit the number of false positive peaks from the programs. However, >80% of these peaks could be manually filtered out by visual inspection alone, without using additional background data, showing that peak shape information is not fully exploited in the evaluated programs. Second, none of the programs returned peak-regions that corresponded to the actual resolution in ChIP-seq data. Our results showed that ChIP-seq peaks should be narrowed down to 100-400?bp, which is sufficient to identify unique peaks and binding sites. Based on these results, we propose a meta-approach that gives improved peak definitions.

SUBMITTER: Rye MB

PROVIDER: S-EPMC3045577 | biostudies-literature | 2011 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs.

Rye Morten Beck MB Sætrom Pål P Drabløs Finn F

Nucleic acids research 20101126 4

Chromatin immunoprecipitation (ChIP) followed by high throughput sequencing (ChIP-seq) is rapidly becoming the method of choice for discovering cell-specific transcription factor binding locations genome wide. By aligning sequenced tags to the genome, binding locations appear as peaks in the tag profile. Several programs have been designed to identify such peaks, but program evaluation has been difficult due to the lack of benchmark data sets. We have created benchmark data sets for three transc ...[more]

PMID: 21113027

Dataset Information

A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs.

Publications

A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Peak Finder Metaserver - a novel application for finding peaks in ChIP-seq data.
| S-EPMC3870998 | biostudies-literature

PGMD: a comprehensive manually curated pharmacogenomic database.
| S-EPMC4819767 | biostudies-literature

Comparative analysis of commonly used peak calling programs for ChIP-Seq analysis.
| S-EPMC7808876 | biostudies-literature

Rhea--a manually curated resource of biochemical reactions.
| S-EPMC3245052 | biostudies-literature

A comprehensive manually-curated compendium of bovine transcription factors.
| S-EPMC6137171 | biostudies-literature

EnDisease: a manually curated database for enhancer-disease associations.
| S-EPMC6382991 | biostudies-literature

Manually curated dataset of catalytic peptides for ester hydrolysis.
| S-EPMC10294096 | biostudies-literature

CMBD: a manually curated cancer metabolic biomarker knowledge database.
| S-EPMC7947571 | biostudies-literature

PCOSDB: PolyCystic Ovary Syndrome Database for manually curated disease associated genes.
| S-EPMC4857457 | biostudies-other

Updates in Rhea--a manually curated resource of biochemical reactions.
| S-EPMC4384025 | biostudies-literature