Unknown

Dataset Information

0

AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data.


ABSTRACT:

Motivation

Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed 'denoising' methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information.

Results

We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage.

Availability and implementation

Source code is available at https://github.com/DormanLab/AmpliCI.

Supplementary information

Supplementary material are available at Bioinformatics online.

SUBMITTER: Peng X 

PROVIDER: S-EPMC7850112 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data.

Peng Xiyu X   Dorman Karin S KS  

Bioinformatics (Oxford, England) 20210101 21


<h4>Motivation</h4>Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed  ...[more]

Similar Datasets

| S-EPMC4927377 | biostudies-literature
| S-EPMC4850673 | biostudies-literature
| S-EPMC8733986 | biostudies-literature
| S-EPMC6865567 | biostudies-literature
| S-EPMC6765106 | biostudies-literature
| S-EPMC2241869 | biostudies-literature
| S-EPMC3982975 | biostudies-literature
| S-EPMC7881719 | biostudies-literature
| S-EPMC11230634 | biostudies-literature
| S-EPMC4690345 | biostudies-literature