Dataset Information

KAnalyze: a fast versatile pipelined k-mer toolkit.

ABSTRACT:

Motivation

Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development efforts through well-architected, documented and testable code. Currently, KAnalyze can output k-mer counts in a sorted tab-delimited file or stream k-mers as they are read. KAnalyze can process large datasets with 2 GB of memory. This project is implemented in Java 7, and the command line interface (CLI) is designed to integrate into pipelines written in any language.

Results

As a k-mer counter, KAnalyze outperforms Jellyfish, DSK and a pipeline built on Perl and Linux utilities. Through extensive unit and system testing, we have verified that KAnalyze produces the correct k-mer counts over multiple datasets and k-mer sizes.

Availability and implementation

KAnalyze is available on SourceForge: https://sourceforge.net/projects/kanalyze/.

SUBMITTER: Audano P

PROVIDER: S-EPMC4080738 | biostudies-literature | 2014 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

KAnalyze: a fast versatile pipelined k-mer toolkit.

Audano Peter P Vannberg Fredrik F

Bioinformatics (Oxford, England) 20140318 14

<h4>Motivation</h4>Converting nucleotide sequences into short overlapping fragments of uniform length, k-mers, is a common step in many bioinformatics applications. While existing software packages count k-mers, few are optimized for speed, offer an application programming interface (API), a graphical interface or contain features that make it extensible and maintainable. We designed KAnalyze to compete with the fastest k-mer counters, to produce reliable output and to support future development ...[more]

PMID: 24642064

Dataset Information

KAnalyze: a fast versatile pipelined k-mer toolkit.

Motivation

Results

Availability and implementation

Publications

KAnalyze: a fast versatile pipelined k-mer toolkit.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes.
| S-EPMC7488116 | biostudies-literature

A versatile toolkit for overcoming AAV immunity.
| S-EPMC9479010 | biostudies-literature

KaMRaT: a C++ toolkit for k-mer count matrix dimension reduction.
| S-EPMC10942800 | biostudies-literature

Versatile toolkit for high-efficiency and scarless overexpression of circular RNAs
2023-12-14 | GSE246020 | GEO

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies.
| S-EPMC5408915 | biostudies-literature

hictk: blazing fast toolkit to work with .hic and .cool files
2023-09-14 | GSE242815 | GEO

BuildAMol: a versatile Python toolkit for fragment-based molecular design.
| S-EPMC11345998 | biostudies-literature

CyDotian: a versatile toolkit for identification of intragenic repeat sequences.
| S-EPMC11462849 | biostudies-literature

PKCalpha: a versatile key for decoding the cellular calcium toolkit.
| S-EPMC2064258 | biostudies-literature

AnnTools: a comprehensive and versatile annotation toolkit for genomic variants.
| S-EPMC3289923 | biostudies-literature