Unknown

Dataset Information

0

MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks.


ABSTRACT: MOTIVATION: Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is the discovery of the taxonomic composition of a given dataset. A simple method for this task, called the Lowest Common Ancestor (LCA), is employed in state-of-the-art computational tools for metagenomic data analysis of very short reads (about 100 bp). However LCA has two main drawbacks: it possibly assigns many reads to high taxonomic ranks and it discards a high number of reads. RESULTS: We present MTR, a new method for tackling these drawbacks using clustering at Multiple Taxonomic Ranks. Unlike LCA, which processes the reads one-by-one, MTR exploits information shared by reads. Specifically, MTR consists of two main phases. First, for each taxonomic rank, a collection of potential clusters of reads is generated, and each potential cluster is associated to a taxon at that rank. Next, a small number of clusters is selected at each rank using a combinatorial optimization algorithm. The effectiveness of the resulting method is tested on a large number of simulated and real-life metagenomes. Results of experiments show that MTR improves on LCA by discarding a significantly smaller number of reads and by assigning much more reads at lower taxonomic ranks. Moreover, MTR provides a more faithful taxonomic characterization of the metagenome population distribution. AVAILABILITY: Matlab and C++ source codes of the method available at http://cs.ru.nl/gori/software/MTR.tar.gz.

SUBMITTER: Gori F 

PROVIDER: S-EPMC3018814 | biostudies-other | 2011 Jan

REPOSITORIES: biostudies-other

altmetric image

Publications

MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks.

Gori Fabio F   Folino Gianluigi G   Jetten Mike S M MS   Marchiori Elena E  

Bioinformatics (Oxford, England) 20101201 2


<h4>Motivation</h4>Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is the discovery of the taxonomic composition of a given dataset. A simple method for this task, called the Lowest Common Ancestor (LCA), is employed in state-of-the-art computational tools for m  ...[more]

Similar Datasets

| S-EPMC3537596 | biostudies-literature
| S-EPMC4141809 | biostudies-literature
| S-EPMC4382904 | biostudies-other
| S-EPMC3120705 | biostudies-literature
| S-EPMC7214025 | biostudies-literature
| S-EPMC3462201 | biostudies-literature
| S-EPMC4545859 | biostudies-literature
| S-EPMC3538547 | biostudies-literature
| S-EPMC3614465 | biostudies-other
| S-EPMC10873905 | biostudies-literature