Dataset Information

NtHash: recursive nucleotide hashing.

ABSTRACT:

Motivation

Hashing has been widely used for indexing, querying and rapid similarity search in many bioinformatics applications, including sequence alignment, genome and transcriptome assembly, k-mer counting and error correction. Hence, expediting hashing operations would have a substantial impact in the field, making bioinformatics applications faster and more efficient.

Results

We present ntHash, a hashing algorithm tuned for processing DNA/RNA sequences. It performs the best when calculating hash values for adjacent k-mers in an input sequence, operating an order of magnitude faster than the best performing alternatives in typical use cases.

Availability and implementation

ntHash is available online at http://www.bcgsc.ca/platform/bioinfo/software/nthash and is free for academic use.

Contacts

hmohamadi@bcgsc.ca or ibirol@bcgsc.caSupplementary information: Supplementary data are available at Bioinformatics online.

SUBMITTER: Mohamadi H

PROVIDER: S-EPMC5181554 | biostudies-literature | 2016 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

ntHash: recursive nucleotide hashing.

Mohamadi Hamid H Chu Justin J Vandervalk Benjamin P BP Birol Inanc I

Bioinformatics (Oxford, England) 20160716 22

<h4>Motivation</h4>Hashing has been widely used for indexing, querying and rapid similarity search in many bioinformatics applications, including sequence alignment, genome and transcriptome assembly, k-mer counting and error correction. Hence, expediting hashing operations would have a substantial impact in the field, making bioinformatics applications faster and more efficient.<h4>Results</h4>We present ntHash, a hashing algorithm tuned for processing DNA/RNA sequences. It performs the best wh ...[more]

PMID: 27423894