Dataset Information

Fast mapping of short sequences with mismatches, insertions and deletions using index structures.

ABSTRACT: With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insertions and deletions (indels). Even though 454 sequencers are able to produce longer reads, the method is frequently applied to small RNA (miRNA and siRNA) sequencing. Fast and accurate matching in particular of short reads with diverse errors is therefore a pressing practical problem. We introduce a matching model for short reads that can, besides mismatches, also cope with indels. It addresses different error models. For example, it can handle the problem of leading and trailing contaminations caused by primers and poly-A tails in transcriptomics or the length-dependent increase of error rates. In these contexts, it thus simplifies the tedious and error-prone trimming step. For efficient searches, our method utilizes index structures in the form of enhanced suffix arrays. In a comparison with current methods for short read mapping, the presented approach shows significantly increased performance not only for 454 reads, but also for Illumina reads. Our approach is implemented in the software segemehl available at http://www.bioinf.uni-leipzig.de/Software/segemehl/.

SUBMITTER: Hoffmann S

PROVIDER: S-EPMC2730575 | biostudies-literature | 2009 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Fast mapping of short sequences with mismatches, insertions and deletions using index structures.

Hoffmann Steve S Otto Christian C Kurtz Stefan S Sharma Cynthia M CM Khaitovich Philipp P Vogel Jörg J Stadler Peter F PF Hackermüller Jörg J

PLoS computational biology 20090911 9

With few exceptions, current methods for short read mapping make use of simple seed heuristics to speed up the search. Most of the underlying matching models neglect the necessity to allow not only mismatches, but also insertions and deletions. Current evaluations indicate, however, that very different error models apply to the novel high-throughput sequencing methods. While the most frequent error-type in Illumina reads are mismatches, reads produced by 454's GS FLX predominantly contain insert ...[more]

PMID: 19750212

Dataset Information

Fast mapping of short sequences with mismatches, insertions and deletions using index structures.

Publications

Fast mapping of short sequences with mismatches, insertions and deletions using index structures.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Mapping insertions, deletions and SNPs on Venter's chromosomes.
| S-EPMC2696090 | biostudies-literature

Optical genome mapping identified deletions, inversions, and insertions in hemophilia.
| S-EPMC11787463 | biostudies-literature

Fast and efficient short read mapping based on a succinct hash index.
| S-EPMC5845352 | biostudies-literature

Sequence context affects the rate of short insertions and deletions in flies and primates.
| S-EPMC2374710 | biostudies-literature

Large genomic fragment deletions and insertions in mouse using CRISPR/Cas9.
| S-EPMC4372442 | biostudies-literature

Probabilistic phylogenetic inference with insertions and deletions.
| S-EPMC2527138 | biostudies-literature

Human gene targeting favors insertions over deletions.
| S-EPMC2940567 | biostudies-literature

Unexpectedly High Levels of Inverted Re-Insertions Using Paired sgRNAs for Genomic Deletions.
| S-EPMC7565582 | biostudies-literature

High-resolution mapping reveals the mechanism and contribution of genome insertions and deletions to RNA virus evolution.
| S-EPMC10400975 | biostudies-literature

CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences.
| S-EPMC4066799 | biostudies-literature