Unknown

Dataset Information

0

Scalable metagenomic taxonomy classification using a reference genome database.


ABSTRACT: MOTIVATION: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. RESULTS: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. AVAILABILITY: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat CONTACT: allen99@llnl.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

SUBMITTER: Ames SK 

PROVIDER: S-EPMC3753567 | biostudies-literature | 2013 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Scalable metagenomic taxonomy classification using a reference genome database.

Ames Sasha K SK   Hysom David A DA   Gardner Shea N SN   Lloyd G Scott GS   Gokhale Maya B MB   Allen Jonathan E JE  

Bioinformatics (Oxford, England) 20130704 18


<h4>Motivation</h4>Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge.<h4>Results</h4>A method is presente  ...[more]

Similar Datasets

| S-EPMC8601625 | biostudies-literature
| S-EPMC4053813 | biostudies-other
| S-EPMC7045516 | biostudies-literature
| S-EPMC9580935 | biostudies-literature
| S-EPMC3953531 | biostudies-literature
2015-05-01 | GSE58431 | GEO
| S-EPMC4888754 | biostudies-other
| S-EPMC3223155 | biostudies-literature
| S-EPMC7540445 | biostudies-literature
| S-EPMC10327048 | biostudies-literature