Unknown

Dataset Information

0

Discovering motifs that induce sequencing errors.


ABSTRACT:

Background

Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur decisive losses in power, due to relating errors with individual genomic positions rather than motifs, or do not properly distinguish between motif-induced and sequence-unspecific sources of errors.

Results

Here, for the first time, we describe a statistically rigorous framework for the discovery of motifs that induce sequencing errors. We apply our method to several datasets from Illumina GA IIx, HiSeq 2000, and MiSeq sequencers. We confirm previously known error-causing sequence contexts and report new more specific ones.

Conclusions

Checking for error-inducing motifs should be included into SNP calling pipelines to avoid false positives. To facilitate filtering of sets of putative SNPs, we provide tracks of error-prone genomic positions (in BED format).

Availability

http://discovering-cse.googlecode.com.

SUBMITTER: Allhoff M 

PROVIDER: S-EPMC3622629 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Discovering motifs that induce sequencing errors.

Allhoff Manuel M   Schönhuth Alexander A   Martin Marcel M   Costa Ivan G IG   Rahmann Sven S   Marschall Tobias T  

BMC bioinformatics 20130410


<h4>Background</h4>Elevated sequencing error rates are the most predominant obstacle in single-nucleotide polymorphism (SNP) detection, which is a major goal in the bulk of current studies using next-generation sequencing (NGS). Beyond routinely handled generic sources of errors, certain base calling errors relate to specific sequence patterns. Statistically principled ways to associate sequence patterns with base calling errors have not been previously described. Extant approaches either incur  ...[more]

Similar Datasets

| S-EPMC3655836 | biostudies-literature
| S-EPMC2323616 | biostudies-literature
| S-EPMC1829477 | biostudies-literature
| S-EPMC6547422 | biostudies-literature
| S-EPMC55461 | biostudies-literature
| S-EPMC6100135 | biostudies-literature
| S-EPMC4168299 | biostudies-literature
| S-EPMC2770069 | biostudies-literature
| S-EPMC6636396 | biostudies-literature
| S-EPMC4635656 | biostudies-literature