Unknown

Dataset Information

0

Simrank: Rapid and sensitive general-purpose k-mer search tool.


ABSTRACT: Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available.Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset.Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity.

SUBMITTER: DeSantis TZ 

PROVIDER: S-EPMC3097142 | biostudies-literature | 2011 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications


<h4>Background</h4>Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose,  ...[more]

Similar Datasets

| S-EPMC3124697 | biostudies-literature
| S-EPMC2896154 | biostudies-literature
| S-EPMC4004051 | biostudies-literature
| S-EPMC8481385 | biostudies-literature
| S-EPMC6821417 | biostudies-literature
| S-EPMC5657049 | biostudies-literature
| S-EPMC6298053 | biostudies-literature
| S-EPMC524372 | biostudies-literature
| S-EPMC6019046 | biostudies-literature
| S-EPMC3880547 | biostudies-literature