Unknown

Dataset Information

0

Syncmers are more sensitive than minimizers for selecting conserved k?mers in biological sequences.


ABSTRACT: Minimizers are widely used to select subsets of fixed-length substrings (k-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of w consecutive k-mers is the k-mer with smallest value according to an ordering of all k-mers. Syncmers are defined here as a family of alternative methods which select k-mers by inspecting the position of the smallest-valued substring of length s < k within the k-mer. For example, a closed syncmer is selected if its smallest s-mer is at the start or end of the k-mer. At least one closed syncmer must be found in every window of length (k - s) k-mers. Unlike a minimizer, a syncmer is identified by its sequence alone, and is therefore synchronized in the following sense: if a given k-mer is selected from one sequence, it will also be selected from any other sequence. Also, minimizers can be deleted by mutations in flanking sequence, which cannot happen with syncmers. Experiments on minimizers with parameters used in the minimap2 read mapper and Kraken taxonomy prediction algorithm respectively show that syncmers can simultaneously achieve both lower density and higher conservation compared to minimizers.

SUBMITTER: Edgar R 

PROVIDER: S-EPMC7869670 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

Syncmers are more sensitive than minimizers for selecting conserved <i>k</i>‑mers in biological sequences.

Edgar Robert R  

PeerJ 20210205


Minimizers are widely used to select subsets of fixed-length substrings (<i>k</i>-mers) from biological sequences in applications ranging from read mapping to taxonomy prediction and indexing of large datasets. The minimizer of a string of <i>w</i> consecutive <i>k</i>-mers is the <i>k</i>-mer with smallest value according to an ordering of all <i>k</i>-mers. Syncmers are defined here as a family of alternative methods which select <i>k</i>-mers by inspecting the position of the smallest-valued  ...[more]

Similar Datasets

| S-EPMC5889393 | biostudies-literature
| S-EPMC4111549 | biostudies-literature
| S-EPMC5537200 | biostudies-other
| S-EPMC2241842 | biostudies-other
| PRJEB18734 | ENA
| S-EPMC4937073 | biostudies-literature
| S-EPMC5533270 | biostudies-literature
| S-EPMC6549478 | biostudies-literature
| S-EPMC403677 | biostudies-literature