Dataset Information

Sources of PCR-induced distortions in high-throughput sequencing data sets.

ABSTRACT: PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error-bias, stochasticity, template switches and polymerase errors-on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules.

SUBMITTER: Kebschull JM

PROVIDER: S-EPMC4666380 | biostudies-literature | 2015 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Sources of PCR-induced distortions in high-throughput sequencing data sets.

Kebschull Justus M JM Zador Anthony M AM

Nucleic acids research 20150717 21

PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error-bias, stochasticity, template switches and polymerase errors-on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR a ...[more]

PMID: 26187991

Dataset Information

Sources of PCR-induced distortions in high-throughput sequencing data sets.

Publications

Sources of PCR-induced distortions in high-throughput sequencing data sets.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Quantifying selection in high-throughput Immunoglobulin sequencing data sets.
| S-EPMC3458526 | biostudies-other

Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data.
| S-EPMC5017645 | biostudies-literature

Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.
| S-EPMC2532726 | biostudies-literature

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets.
| S-EPMC4429651 | biostudies-literature

Protein complex-based analysis framework for high-throughput data sets.
| S-EPMC3756668 | biostudies-literature

Effect of PCR extension temperature on high-throughput sequencing.
| S-EPMC3026866 | biostudies-literature

Compression of structured high-throughput sequencing data.
| S-EPMC3832420 | biostudies-literature

High-Throughput Tabular Data Processor - Platform independent graphical tool for processing large data sets.
| S-EPMC5809091 | biostudies-literature

Proficiency Testing of Virus Diagnostics Based on Bioinformatics Analysis of Simulated <i>In Silico</i> High-Throughput Sequencing Data Sets.
| S-EPMC6663916 | biostudies-literature

Digital PCR provides sensitive and absolute calibration for high throughput sequencing.
| S-EPMC2667538 | biostudies-literature