Dataset Information

KCMBT: a k-mer Counter based on Multiple Burst Trees.

ABSTRACT: A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications.We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads.KCMBT is freely available on GitHub: (https://github.com/abdullah009/kcmbt_mt).rajasek@engr.uconn.eduSupplementary data are available at Bioinformatics online.

SUBMITTER: Mamun AA

PROVIDER: S-EPMC5939891 | biostudies-other | 2016 Sep

REPOSITORIES: biostudies-other

ACCESS DATA

Publications

KCMBT: a k-mer Counter based on Multiple Burst Trees.

Mamun Abdullah-Al AA Pal Soumitra S Rajasekaran Sanguthevar S

Bioinformatics (Oxford, England) 20160609 18

<h4>Motivation</h4>A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such ap ...[more]

PMID: 27283950

Dataset Information

KCMBT: a k-mer Counter based on Multiple Burst Trees.

Publications

KCMBT: a k-mer Counter based on Multiple Burst Trees.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Similar Datasets

Oxidative damage and respiratory burst in Multiple Sclerosis
2011-10-13 | GSE32915 | GEO

Quantum key based burst confidentiality in optical burst switched networks.
| S-EPMC3919048 | biostudies-other

Oxidative damage and respiratory burst in Multiple Sclerosis
2011-10-12 | E-GEOD-32915 | biostudies-arrayexpress

Optimization of a 40-mer Antimyelin DNA Aptamer Identifies a 20-mer with Enhanced Properties for Potential Multiple Sclerosis Therapy.
| S-EPMC6555174 | biostudies-literature

Disk-based k-mer counting on a PC.
| S-EPMC3680041 | biostudies-literature

FQSqueezer: k-mer-based compression of sequencing data.
| S-EPMC6969201 | biostudies-literature

Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs.
| S-EPMC4419412 | biostudies-literature

QuCo: quartet-based co-estimation of species trees and gene trees.
| S-EPMC9235488 | biostudies-literature

Interpreting k-mer-based signatures for antibiotic resistance prediction.
| S-EPMC7568433 | biostudies-literature

Multiple Daily Rounds of Theta-Burst Stimulation for Tinnitus: Preliminary Results.
| S-EPMC8401076 | biostudies-literature