Dataset Information

Optimal seed solver: optimizing seed selection in read mapping.

ABSTRACT:

Motivation

Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the ability of the mapper in selecting less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds.

Results

We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-base-pair read in [Formula: see text] operations on average and in [Formula: see text] operations in the worst case, while generating a maximum of [Formula: see text] seed frequency database lookups. We compare OSS against four state-of-the-art seed selection schemes and observe that OSS provides a 3-fold reduction in average seed frequency over the best previous seed selection optimizations.

Availability and implementation

We provide an implementation of the Optimal Seed Solver in C++ at: https://github.com/CMU-SAFARI/Optimal-Seed-Solver

Contact

hxin@cmu.edu, calkan@cs.bilkent.edu.tr or onur@cmu.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Xin H

PROVIDER: S-EPMC6363230 | biostudies-literature | 2016 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Optimal seed solver: optimizing seed selection in read mapping.

Xin Hongyi H Nahar Sunny S Zhu Richard R Emmons John J Pekhimenko Gennady G Kingsford Carl C Alkan Can C Mutlu Onur O

Bioinformatics (Oxford, England) 20151114 11

<h4>Motivation</h4>Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the ability of the mapper in selecting less frequent seeds to speed up the mapp ...[more]

PMID: 26568624

Dataset Information

Optimal seed solver: optimizing seed selection in read mapping.

Motivation

Results

Availability and implementation

Contact

Supplementary information

Publications

Optimal seed solver: optimizing seed selection in read mapping.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.
| S-EPMC3664803 | biostudies-literature

Optimizing read mapping to reference genomes to determine composition and species prevalence in microbial communities.
| S-EPMC3374613 | biostudies-literature

Short Read Mapping: An Algorithmic Tour.
| S-EPMC5425171 | biostudies-other

Self-extinction through optimizing selection.
| S-EPMC3730061 | biostudies-other

Optimizing cord blood selection.
| S-EPMC6913431 | biostudies-literature

Accelerating read mapping with FastHASH.
| S-EPMC3549798 | biostudies-literature

HINGE: long-read assembly achieves optimal repeat resolution.
| S-EPMC5411769 | biostudies-literature

Context-aware seeds for read mapping.
| S-EPMC7245042 | biostudies-literature

Deconvolution of breast tumors for optimal chemotherapy selection
2021-12-10 | GSE168410 | GEO

Improving read mapping using additional prefix grams.
| S-EPMC3927682 | biostudies-literature