Unknown

Dataset Information

0

Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps.


ABSTRACT: Metagenomic sequence classification should be fast, accurate and information-rich. Emerging long-read sequencing technologies promise to improve the balance between these factors but most existing methods were designed for short reads. MetaMaps is a new method, specifically developed for long reads, capable of mapping a long-read metagenome to a comprehensive RefSeq database with >12,000 genomes in <16?GB or RAM on a laptop computer. Integrating approximate mapping with probabilistic scoring and EM-based estimation of sample composition, MetaMaps achieves >94% accuracy for species-level read assignment and r2?>?0.97 for the estimation of sample composition on both simulated and real data when the sample genomes or close relatives are present in the classification database. To address novel species and genera, which are comparatively harder to predict, MetaMaps outputs mapping locations and qualities for all classified reads, enabling functional studies (e.g. gene presence/absence) and detection of incongruities between sample and reference genomes.

SUBMITTER: Dilthey AT 

PROVIDER: S-EPMC6624308 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps.

Dilthey Alexander T AT   Jain Chirag C   Koren Sergey S   Phillippy Adam M AM  

Nature communications 20190711 1


Metagenomic sequence classification should be fast, accurate and information-rich. Emerging long-read sequencing technologies promise to improve the balance between these factors but most existing methods were designed for short reads. MetaMaps is a new method, specifically developed for long reads, capable of mapping a long-read metagenome to a comprehensive RefSeq database with >12,000 genomes in <16 GB or RAM on a laptop computer. Integrating approximate mapping with probabilistic scoring and  ...[more]

Similar Datasets

| S-EPMC3462201 | biostudies-literature
| S-EPMC3232206 | biostudies-literature
| S-EPMC3592424 | biostudies-other
| S-EPMC5533426 | biostudies-other
| S-EPMC7506068 | biostudies-literature
| S-EPMC10433603 | biostudies-literature
| S-EPMC3496342 | biostudies-literature
| S-EPMC5870846 | biostudies-literature
| S-EPMC5314789 | biostudies-literature
| S-EPMC8388557 | biostudies-literature