Unknown

Dataset Information

0

Mixture models for analysis of the taxonomic composition of metagenomes.


ABSTRACT: MOTIVATION: Inferring the taxonomic profile of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in metagenomics. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmentary sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependence complicates comparative analysis of data originating from different sequencing platforms or resulting from different preprocessing pipelines. RESULTS: We here introduce a new method for taxonomic profiling based on mixture modeling of the overall oligonucleotide distribution of a sample. Our results indicate that the mixture-based profiles compare well with taxonomic profiles obtained with other methods. However, in contrast to the existing methods, our approach shows a nearly constant profiling accuracy across all kinds of read lengths and it operates at an unrivaled speed. AVAILABILITY: A platform-independent implementation of the mixture modeling approach is available in terms of a MATLAB/Octave toolbox at http://gobics.de/peter/taxy. In addition, a prototypical implementation within an easy-to-use interactive tool for Windows can be downloaded.

SUBMITTER: Meinicke P 

PROVIDER: S-EPMC3106201 | biostudies-other | 2011 Jun

REPOSITORIES: biostudies-other

altmetric image

Publications

Mixture models for analysis of the taxonomic composition of metagenomes.

Meinicke Peter P   Asshauer Kathrin Petra KP   Lingner Thomas T  

Bioinformatics (Oxford, England) 20110505 12


<h4>Motivation</h4>Inferring the taxonomic profile of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in metagenomics. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmentary sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependence complicates comparative analysis of data originating from different sequencing platforms or resulting  ...[more]

Similar Datasets

| S-EPMC1887592 | biostudies-literature
| S-EPMC8591284 | biostudies-literature
| S-EPMC5181528 | biostudies-literature
| S-EPMC4836559 | biostudies-other
| S-EPMC4380030 | biostudies-literature
| S-EPMC6902526 | biostudies-literature
| S-EPMC6129288 | biostudies-literature
| S-EPMC1599726 | biostudies-literature
| S-EPMC4406156 | biostudies-literature
| S-EPMC6403234 | biostudies-literature