Dataset Information

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models.

ABSTRACT:

Background

Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them.

Results

Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS ( https://github.com/aLiehrmann/CROCS ), detect the peaks more accurately than algorithms which rely on natural assumptions.

Conclusion

The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications.

SUBMITTER: Liehrmann A

PROVIDER: S-EPMC8201703 | biostudies-literature | 2021 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models.

Liehrmann Arnaud A Rigaill Guillem G Hocking Toby Dylan TD

BMC bioinformatics 20210614 1

<h4>Background</h4>Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the ...[more]

PMID: 34126932

Dataset Information

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models.

Background

Results

Conclusion

Publications

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.
| S-EPMC5408812 | biostudies-literature

Identifying dispersed epigenomic domains from ChIP-Seq data.
| S-EPMC3051331 | biostudies-literature

Shape-based peak identification for ChIP-Seq.
| S-EPMC3032669 | biostudies-literature

Differential ATAC-seq and ChIP-seq peak detection using ROTS.
| S-EPMC8253552 | biostudies-literature

NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.
| S-EPMC3672025 | biostudies-literature

Evaluation of algorithm performance in ChIP-seq peak detection.
| S-EPMC2900203 | biostudies-other

peaksat: an R package for ChIP-seq peak saturation analysis.
| S-EPMC9878872 | biostudies-literature

PeakRanger: a cloud-enabled peak caller for ChIP-seq data.
| S-EPMC3103446 | biostudies-literature

OccuPeak: ChIP-Seq peak calling based on internal background modelling.
| S-EPMC4061025 | biostudies-literature

Features that define the best ChIP-seq peak calling algorithms.
| S-EPMC5429005 | biostudies-literature