Unknown

Dataset Information

0

Systematic bias in high-throughput sequencing data and its correction by BEADS.


ABSTRACT: Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.

SUBMITTER: Cheung MS 

PROVIDER: S-EPMC3159482 | biostudies-literature | 2011 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Systematic bias in high-throughput sequencing data and its correction by BEADS.

Cheung Ming-Sin MS   Down Thomas A TA   Latorre Isabel I   Ahringer Julie J  

Nucleic acids research 20110606 15


Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three fa  ...[more]

Similar Datasets

2011-05-25 | GSE29427 | GEO
2011-05-25 | E-GEOD-29427 | biostudies-arrayexpress
| S-EPMC4719071 | biostudies-literature
| S-EPMC3295828 | biostudies-literature
| S-EPMC3149584 | biostudies-literature
| S-EPMC3945112 | biostudies-literature
| S-EPMC5607347 | biostudies-literature
| S-EPMC3166835 | biostudies-literature
| S-EPMC3378858 | biostudies-other
| S-EPMC4833864 | biostudies-other