Dataset Information

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.

ABSTRACT: Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome.We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms.Labeled histone mark data http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ , R package to compute the label error of predicted peaks https://github.com/tdhock/PeakError.toby.hocking@mail.mcgill.ca or guil.bourque@mcgill.ca.Supplementary data are available at Bioinformatics online.

SUBMITTER: Hocking TD

PROVIDER: S-EPMC5408812 | biostudies-literature | 2017 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.

Hocking Toby Dylan TD Goerner-Potvin Patricia P Morin Andreanne A Shao Xiaojian X Pastinen Tomi T Bourque Guillaume G

Bioinformatics (Oxford, England) 20170201 4

<h4>Motivation</h4>Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not ...[more]

PMID: 27797775

Dataset Information

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.

Publications

Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.
| S-EPMC3351193 | biostudies-literature

AIControl: replacing matched control experiments with machine learning improves ChIP-seq peak identification.
| S-EPMC6547432 | biostudies-literature

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models.
| S-EPMC8201703 | biostudies-literature

Shape-based peak identification for ChIP-Seq.
| S-EPMC3032669 | biostudies-literature

Differential ATAC-seq and ChIP-seq peak detection using ROTS.
| S-EPMC8253552 | biostudies-literature

Optimizing sample size for supervised machine learning with bulk transcriptomic sequencing: a learning curve approach.
| S-EPMC11899567 | biostudies-literature

Optimizing Sample Size for Supervised Machine Learning with Bulk Transcriptomic Sequencing: A Learning Curve Approach.
| S-EPMC11419172 | biostudies-literature

CNN-Peaks: ChIP-Seq peak detection pipeline using convolutional neural networks that imitate human visual inspection.
| S-EPMC7220942 | biostudies-literature

NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.
| S-EPMC3672025 | biostudies-literature

Evaluation of algorithm performance in ChIP-seq peak detection.
| S-EPMC2900203 | biostudies-other