Unknown

Dataset Information

0

Large-scale sequence comparisons with sourmash.


ABSTRACT: The sourmash software package uses MinHash-based sketching to create "signatures", compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely available under the BSD license at http://github.com/dib-lab/sourmash.

SUBMITTER: Pierce NT 

PROVIDER: S-EPMC6720031 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

Large-scale sequence comparisons with <i>sourmash</i>.

Pierce N Tessa NT   Irber Luiz L   Reiter Taylor T   Brooks Phillip P   Brown C Titus CT  

F1000Research 20190704


The sourmash software package uses MinHash-based sketching to create "signatures", compressed representations of DNA, RNA, and protein sequences, that can be stored, searched, explored, and taxonomically annotated. sourmash signatures can be used to estimate sequence similarity between very large data sets quickly and in low memory, and can be used to search large databases of genomes for matches to query genomes and metagenomes. sourmash is implemented in C++, Rust, and Python, and is freely av  ...[more]

Similar Datasets

| S-EPMC7069636 | biostudies-literature
| S-EPMC4868480 | biostudies-literature
| S-EPMC10567571 | biostudies-literature
| S-EPMC2665050 | biostudies-literature
| S-EPMC4896366 | biostudies-literature
| S-EPMC3286201 | biostudies-literature
| S-EPMC1524917 | biostudies-literature
| S-EPMC6041967 | biostudies-literature
| S-EPMC4185199 | biostudies-other
| S-EPMC419331 | biostudies-literature