Unknown

Dataset Information

0

INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences.


ABSTRACT:

Background

Taxonomic classification of metagenomic sequences is the first step in metagenomic analysis. Existing taxonomic classification approaches are of two types, similarity-based and composition-based. Similarity-based approaches, though accurate and specific, are extremely slow. Since, metagenomic projects generate millions of sequences, adopting similarity-based approaches becomes virtually infeasible for research groups having modest computational resources. In this study, we present INDUS - a composition-based approach that incorporates the following novel features. First, INDUS discards the 'one genome-one composition' model adopted by existing compositional approaches. Second, INDUS uses 'compositional distance' information for identifying appropriate assignment levels. Third, INDUS incorporates steps that attempt to reduce biases due to database representation.

Results

INDUS is able to rapidly classify sequences in both simulated and real metagenomic sequence data sets with classification efficiency significantly higher than existing composition-based approaches. Although the classification efficiency of INDUS is observed to be comparable to those by similarity-based approaches, the binning time (as compared to alignment based approaches) is 23-33 times lower.

Conclusion

Given it's rapid execution time, and high levels of classification efficiency, INDUS is expected to be of immense interest to researchers working in metagenomics and microbial ecology.

Availability

A web-server for the INDUS algorithm is available at http://metagenomics.atc.tcs.com/INDUS/

SUBMITTER: Mohammed MH 

PROVIDER: S-EPMC3333187 | biostudies-literature | 2011 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences.

Mohammed Monzoorul Haque MH   Ghosh Tarini Shankar TS   Reddy Rachamalla Maheedhar RM   Reddy Chennareddy Venkata Siva Kumar CV   Singh Nitin Kumar NK   Mande Sharmila S SS  

BMC genomics 20111130


<h4>Background</h4>Taxonomic classification of metagenomic sequences is the first step in metagenomic analysis. Existing taxonomic classification approaches are of two types, similarity-based and composition-based. Similarity-based approaches, though accurate and specific, are extremely slow. Since, metagenomic projects generate millions of sequences, adopting similarity-based approaches becomes virtually infeasible for research groups having modest computational resources. In this study, we pre  ...[more]

Similar Datasets

| S-EPMC2957682 | biostudies-literature
| S-EPMC6085705 | biostudies-literature
| S-EPMC3152360 | biostudies-literature
| S-EPMC3319535 | biostudies-literature
| S-EPMC5131823 | biostudies-literature
| S-EPMC3294464 | biostudies-literature
| S-EPMC4051165 | biostudies-literature
| S-EPMC4218995 | biostudies-literature