Unknown

Dataset Information

0

CNV-guided multi-read allocation for ChIP-seq.


ABSTRACT: In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads.We propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets.Available at http://www.stat.wisc.edu/?qizhang/: qizhang@stat.wisc.edu or keles@stat.wisc.eduSupplementary data are available at Bioinformatics online.

SUBMITTER: Zhang Q 

PROVIDER: S-EPMC4184254 | biostudies-literature | 2014 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

CNV-guided multi-read allocation for ChIP-seq.

Zhang Qi Q   Keleş Sündüz S  

Bioinformatics (Oxford, England) 20140624 20


<h4>Motivation</h4>In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and igno  ...[more]

Similar Datasets

| S-EPMC2656530 | biostudies-literature
| S-EPMC3136429 | biostudies-literature
| S-EPMC2912305 | biostudies-literature
| S-EPMC3988018 | biostudies-literature
| S-EPMC3658457 | biostudies-literature
| S-EPMC4344487 | biostudies-literature
| S-EPMC11329654 | biostudies-literature
| S-EPMC4765064 | biostudies-literature
| S-EPMC3053263 | biostudies-literature
| S-EPMC3098059 | biostudies-literature