Dataset Information

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.

ABSTRACT: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abundant sequence pairs needlessly waste resources and severely hinder the performance of read mappers. Therefore, it is crucial to develop a fast and accurate filter that can rapidly and efficiently detect error-abundant string pairs and remove them from consideration before more computationally expensive methods are used.We present a simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit-parallel and SIMD-parallel operations. SHD only filters string pairs that contain more errors than a user-defined threshold, making it fully comprehensive. It also maintains high accuracy with moderate error threshold (up to 5% of the string length) while achieving a 3-fold speedup over the best previous algorithm (Gene Myers's bit-vector algorithm). SHD is compatible with all mappers that perform sequence alignment for verification.

SUBMITTER: Xin H

PROVIDER: S-EPMC4426831 | biostudies-other | 2015 May

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.

Xin Hongyi H Greth John J Emmons John J Pekhimenko Gennady G Kingsford Carl C Alkan Can C Mutlu Onur O

Bioinformatics (Oxford, England) 20150110 10

<h4>Motivation</h4>Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small edit-distance provide useful scientific data. However, the majority of sequence pairs analyzed by seed-and-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abun ...[more]

PMID: 25577434

Dataset Information

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.

Publications

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Similar Datasets

Fast and accurate read alignment for resequencing.
| S-EPMC3436849 | biostudies-literature

Fast and accurate short read alignment with Burrows-Wheeler transform.
| S-EPMC2705234 | biostudies-literature

Fast and accurate long-read alignment with Burrows-Wheeler transform.
| S-EPMC2828108 | biostudies-literature

RRBSMAP: a fast, accurate and user-friendly alignment tool for reduced representation bisulfite sequencing.
| S-EPMC3268241 | biostudies-literature

deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index.
| S-EPMC6913027 | biostudies-literature

Fast gapped-read alignment with Bowtie 2.
| S-EPMC3322381 | biostudies-literature

Shouji: a fast and efficient pre-alignment filter for sequence alignment.
| S-EPMC6821304 | biostudies-literature

Fast and accurate long-read assembly with wtdbg2.
| S-EPMC7004874 | biostudies-literature

LSCplus: a fast solution for improving long read accuracy by short read alignment.
| S-EPMC5103424 | biostudies-literature

WALT: fast and accurate read mapping for bisulfite sequencing.
| S-EPMC5181568 | biostudies-literature