Unknown

Dataset Information

0

Improved design and analysis of practical minimizers.


ABSTRACT:

Motivation

Minimizers are methods to sample k-mers from a string, with the guarantee that similar set of k-mers will be chosen on similar strings. It is parameterized by the k-mer length k, a window length w and an order on the k-mers. Minimizers are used in a large number of softwares and pipelines to improve computation efficiency and decrease memory usage. Despite the method's popularity, many theoretical questions regarding its performance remain open. The core metric for measuring performance of a minimizer is the density, which measures the sparsity of sampled k-mers. The theoretical optimal density for a minimizer is 1/w, provably not achievable in general. For given k and w, little is known about asymptotically optimal minimizers, that is minimizers with density O(1/w).

Results

We derive a necessary and sufficient condition for existence of asymptotically optimal minimizers. We also provide a randomized algorithm, called the Miniception, to design minimizers with the best theoretical guarantee to date on density in practical scenarios. Constructing and using the Miniception is as easy as constructing and using a random minimizer, which allows the design of efficient minimizers that scale to the values of k and w used in current bioinformatics software programs.

Availability and implementation

Reference implementation of the Miniception and the codes for analysis can be found at https://github.com/kingsford-group/miniception.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Zheng H 

PROVIDER: S-EPMC8248892 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC7190069 | biostudies-literature
| S-EPMC6247926 | biostudies-literature
| S-EPMC8307720 | biostudies-literature
2013-06-01 | E-MTAB-1636 | biostudies-arrayexpress
| S-EPMC8156997 | biostudies-literature
| S-EPMC6327964 | biostudies-literature
| S-EPMC2769969 | biostudies-literature
| S-EPMC7090093 | biostudies-literature
| S-EPMC6198738 | biostudies-literature
| S-EPMC7340549 | biostudies-literature