Unknown

Dataset Information

0

Mash Screen: high-throughput sequence containment estimation for genome discovery.


ABSTRACT: The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.

SUBMITTER: Ondov BD 

PROVIDER: S-EPMC6833257 | biostudies-literature | 2019 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Mash Screen: high-throughput sequence containment estimation for genome discovery.

Ondov Brian D BD   Starrett Gabriel J GJ   Sappington Anna A   Kostic Aleksandra A   Koren Sergey S   Buck Christopher B CB   Phillippy Adam M AM  

Genome biology 20191105 1


The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discove  ...[more]

Similar Datasets

| S-EPMC1894642 | biostudies-literature
| S-EPMC4915045 | biostudies-literature
| S-EPMC9299179 | biostudies-literature
| S-EPMC11305015 | biostudies-literature
| S-EPMC314289 | biostudies-literature
| S-EPMC9298130 | biostudies-literature
| S-EPMC4538536 | biostudies-literature
| S-EPMC8656376 | biostudies-literature
| S-EPMC5947915 | biostudies-literature
| S-EPMC4125401 | biostudies-literature