Unknown

Dataset Information

0

Dashing: fast and accurate genomic distances with HyperLogLog.


ABSTRACT: Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

SUBMITTER: Baker DN 

PROVIDER: S-EPMC6892282 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Dashing: fast and accurate genomic distances with HyperLogLog.

Baker Daniel N DN   Langmead Ben B  

Genome biology 20191204 1


Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open s  ...[more]

Similar Datasets

| S-EPMC4179615 | biostudies-literature
| S-EPMC9234764 | biostudies-literature
| S-EPMC10538361 | biostudies-literature
| S-EPMC6396417 | biostudies-literature
| S-EPMC4428112 | biostudies-literature
| S-EPMC4848767 | biostudies-literature
| S-EPMC2045152 | biostudies-literature
| S-EPMC8017614 | biostudies-literature
| S-EPMC3118168 | biostudies-literature
| S-EPMC7141870 | biostudies-literature