Unknown

Dataset Information

0

AmpliCI: A High-resolution Model-Based Approach for Denoising Illumina Amplicon Data.


ABSTRACT:

Motivation

Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false positive rates. Recently developed "denoising" methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information.

Results

We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI takes into account quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage.

Availability

Source code is available at https://github.com/DormanLab/AmpliCI.

Supplementary information

Supplementary material are available at Bioinformatics online.

SUBMITTER: Peng X 

PROVIDER: S-EPMC7850112 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data.

Peng Xiyu X   Dorman Karin S KS  

Bioinformatics (Oxford, England) 20210101 21


<h4>Motivation</h4>Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed  ...[more]

Similar Datasets

| S-EPMC4927377 | biostudies-literature
| S-EPMC4850673 | biostudies-literature
| S-EPMC8733986 | biostudies-literature
| S-EPMC6865567 | biostudies-literature
| S-EPMC6765106 | biostudies-literature
| S-EPMC2241869 | biostudies-literature
| S-EPMC3982975 | biostudies-literature
| S-EPMC3018808 | biostudies-other
| S-EPMC7881719 | biostudies-literature
| S-EPMC9075697 | biostudies-literature