Unknown

Dataset Information

0

Evaluating techniques for metagenome annotation using simulated sequence data.


ABSTRACT: The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies.

SUBMITTER: Randle-Boggis RJ 

PROVIDER: S-EPMC4892715 | biostudies-literature | 2016 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluating techniques for metagenome annotation using simulated sequence data.

Randle-Boggis Richard J RJ   Helgason Thorunn T   Sapp Melanie M   Ashton Peter D PD  

FEMS microbiology ecology 20160508 7


The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community compos  ...[more]

Similar Datasets

| S-EPMC5493203 | biostudies-literature
| S-EPMC8792440 | biostudies-literature
| S-EPMC3814047 | biostudies-literature
| S-EPMC2587480 | biostudies-literature
| S-EPMC3100316 | biostudies-literature
| S-EPMC4985018 | biostudies-literature
| S-EPMC4860591 | biostudies-literature
| S-EPMC7290575 | biostudies-literature
| S-EPMC7305189 | biostudies-literature
| S-EPMC5180198 | biostudies-literature