Dataset Information

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

ABSTRACT: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

SUBMITTER: Chung D

PROVIDER: S-EPMC3136429 | biostudies-literature | 2011 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

Chung Dongjun D Kuan Pei Fen PF Li Bo B Sanalkumar Rajendran R Liang Kun K Bresnick Emery H EH Dewey Colin C Keleş Sündüz S

PLoS computational biology 20110714 7

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We des ...[more]

PMID: 21779159

Dataset Information

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

Publications

Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping.
| S-EPMC4618727 | biostudies-literature

Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data.
| S-EPMC8158016 | biostudies-literature

HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data.
| S-EPMC2912305 | biostudies-literature

Pinpointing transcription factor binding sites from ChIP-seq data with SeqSite.
| S-EPMC3287483 | biostudies-literature

Optimized detection of transcription factor-binding sites in ChIP-seq experiments.
| S-EPMC3245948 | biostudies-literature

Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data.
| S-EPMC2917543 | biostudies-literature

PolyaPeak: detecting transcription factor binding sites from ChIP-seq using peak shape information.
| S-EPMC3946423 | biostudies-literature

Identification of transcription factor binding sites from ChIP-seq data at high resolution.
| S-EPMC3799470 | biostudies-literature

CNV-guided multi-read allocation for ChIP-seq.
| S-EPMC4184254 | biostudies-literature

Highly repetitive DNA sequences in cyanobacterial genomes.
| S-EPMC208921 | biostudies-other