Ontology highlight
ABSTRACT: Motivation
Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false positive rates. Recently developed "denoising" methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information.Results
We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI takes into account quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage.Availability
Source code is available at https://github.com/DormanLab/AmpliCI.Supplementary information
Supplementary material are available at Bioinformatics online.
SUBMITTER: Peng X
PROVIDER: S-EPMC7850112 | biostudies-literature | 2020 Jul
REPOSITORIES: biostudies-literature
Bioinformatics (Oxford, England) 20210101 21
<h4>Motivation</h4>Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ...[more]