Unknown

Dataset Information

0

Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads.


ABSTRACT:

Background

The first step in understanding ecological community diversity and dynamics is quantifying community membership. An increasingly common method for doing so is through metagenomics. Because of the rapidly increasing popularity of this approach, a large number of computational tools and pipelines are available for analysing metagenomic data. However, the majority of these tools have been designed and benchmarked using highly accurate short read data (i.e. Illumina), with few studies benchmarking classification accuracy for long error-prone reads (PacBio or Oxford Nanopore). In addition, few tools have been benchmarked for non-microbial communities.

Results

Here we compare simulated long reads from Oxford Nanopore and Pacific Biosciences (PacBio) with high accuracy Illumina read sets to systematically investigate the effects of sequence length and taxon type on classification accuracy for metagenomic data from both microbial and non-microbial communities. We show that very generally, classification accuracy is far lower for non-microbial communities, even at low taxonomic resolution (e.g. family rather than genus). We then show that for two popular taxonomic classifiers, long reads can significantly increase classification accuracy, and this is most pronounced for non-microbial communities.

Conclusions

This work provides insight on the expected accuracy for metagenomic analyses for different taxonomic groups, and establishes the point at which read length becomes more important than error rate for assigning the correct taxon.

SUBMITTER: Pearman WS 

PROVIDER: S-EPMC7257156 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads.

Pearman William S WS   Freed Nikki E NE   Silander Olin K OK  

BMC bioinformatics 20200529 1


<h4>Background</h4>The first step in understanding ecological community diversity and dynamics is quantifying community membership. An increasingly common method for doing so is through metagenomics. Because of the rapidly increasing popularity of this approach, a large number of computational tools and pipelines are available for analysing metagenomic data. However, the majority of these tools have been designed and benchmarked using highly accurate short read data (i.e. Illumina), with few stu  ...[more]

Similar Datasets

| S-EPMC4720449 | biostudies-literature
| S-EPMC6822431 | biostudies-literature
| S-EPMC7850483 | biostudies-literature
| S-EPMC2911387 | biostudies-literature
2024-07-10 | GSE271528 | GEO
2024-07-10 | GSE271527 | GEO
| S-EPMC3460743 | biostudies-literature
| S-EPMC4282231 | biostudies-literature
| PRJEB49187 | ENA
2023-09-01 | GSE225380 | GEO