Dataset Information

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

ABSTRACT: Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.

SUBMITTER: Laehnemann D

PROVIDER: S-EPMC4719071 | biostudies-literature | 2016 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

Laehnemann David D Borkhardt Arndt A McHardy Alice Carolyn AC

Briefings in bioinformatics 20150529 1

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball se ...[more]

PMID: 26026159

Dataset Information

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

Publications

Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Empirical assessment of sequencing errors for high throughput pyrosequencing data.
| S-EPMC3852801 | biostudies-literature

High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing.
| S-EPMC3856802 | biostudies-other

Systematic bias in high-throughput sequencing data and its correction by BEADS.
| S-EPMC3159482 | biostudies-literature

Universal count correction for high-throughput sequencing.
| S-EPMC3945112 | biostudies-literature

THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing data.
| S-EPMC4054893 | biostudies-literature

DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data.
| S-EPMC5996464 | biostudies-literature

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.
| S-EPMC4868289 | biostudies-literature

Systematic bias in high-throughput sequencing data and its correction by BEADS
2011-05-25 | E-GEOD-29427 | biostudies-arrayexpress

High-throughput, high-fidelity HLA genotyping with deep sequencing.
| S-EPMC3365218 | biostudies-literature

Systematic bias in high-throughput sequencing data and its correction by BEADS
2011-05-25 | GSE29427 | GEO