Dataset Information

High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing.

ABSTRACT: A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ~0.1-1 × 10(-2) per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, "circle sequencing," which allows for robust downstream computational correction of these errors. In this strategy, DNA templates are circularized, copied multiple times in tandem with a rolling circle polymerase, and then sequenced on any high-throughput sequencing machine. Each read produced is computationally processed to obtain a consensus sequence of all linked copies of the original molecule. Physically linking the copies ensures that each copy is independently derived from the original molecule and allows for efficient formation of consensus sequences. The circle-sequencing protocol precedes standard library preparations and is therefore suitable for a broad range of sequencing applications. We tested our method using the Illumina MiSeq platform and obtained errors in our processed sequencing reads at a rate as low as 7.6 × 10(-6) per base sequenced, dramatically improving the error rate of Illumina sequencing and putting error on par with low-throughput, but highly accurate, Sanger sequencing. Circle sequencing also had substantially higher efficiency and lower cost than existing barcode-based schemes for correcting sequencing errors.

SUBMITTER: Lou DI

PROVIDER: S-EPMC3856802 | biostudies-other | 2013 Dec

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing.

Lou Dianne I DI Hussmann Jeffrey A JA McBee Ross M RM Acevedo Ashley A Andino Raul R Press William H WH Sawyer Sara L SL

Proceedings of the National Academy of Sciences of the United States of America 20131115 49

A major limitation of high-throughput DNA sequencing is the high rate of erroneous base calls produced. For instance, Illumina sequencing machines produce errors at a rate of ~0.1-1 × 10(-2) per base sequenced. These technologies typically produce billions of base calls per experiment, translating to millions of errors. We have developed a unique library preparation strategy, "circle sequencing," which allows for robust downstream computational correction of these errors. In this strategy, DNA t ...[more]

PMID: 24243955

Dataset Information

High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing.

Publications

High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.
| S-EPMC4719071 | biostudies-literature

Empirical assessment of sequencing errors for high throughput pyrosequencing data.
| S-EPMC3852801 | biostudies-literature

Efficient storage of high throughput DNA sequencing data using reference-based compression.
| S-EPMC3083090 | biostudies-literature

Automated single-cell proteomics covering over four orders of magnitude at high throughput
2024-01-14 | MSV000093867 | MassIVE

High-Throughput DNA sequencing of ancient wood.
| S-EPMC5896730 | biostudies-literature

Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations.
| S-EPMC5937187 | biostudies-literature

Biofoundry-Scale DNA Assembly Validation Using Cost-Effective High-Throughput Long-Read Sequencing.
| S-EPMC10877595 | biostudies-literature

Mapping DNA methylation with high-throughput nanopore sequencing.
| S-EPMC5704956 | biostudies-literature

Indel-correcting DNA barcodes for high-throughput sequencing.
| S-EPMC6142223 | biostudies-literature

Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire.
| S-EPMC3045962 | biostudies-literature