Dataset Information

Implications of pyrosequencing error correction for biological data interpretation.

ABSTRACT: There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to reduce error. There was a large overlap in reads culled by each method, although AmpliconNoise removed a greater net number of reads. Most OTUs produced by one method had a clearly corresponding partner in the other. Although each method resulted in OTUs consisting entirely of reads that were culled by the other method, there were many more such OTUs formed in the standard pipeline. Total OTU richness was reduced by AmpliconNoise processing, but per-sample OTU richness, diversity and evenness were increased. Increases in per-sample richness and diversity may be a result of AmpliconNoise processing producing a more even OTU rank-abundance distribution. Because communities were randomly subsampled to equalize sample size across communities, and because rare sequence variants are less likely to be selected during subsampling, fewer OTUs were lost from individual communities when subsampling AmpliconNoise-processed data. In contrast to taxon-based diversity estimates, phylogenetic diversity was reduced even on a per-sample basis by de-noising, and samples switched widely in diversity rankings. This work illustrates the significant impacts of processing pipelines on the biological interpretations that can be made from pyrosequencing surveys. This study provides important cautions for analyses of contemporary data, for requisite data archiving (processed vs. non-processed data), and for drawing comparisons among studies performed using distinct data processing pipelines.

SUBMITTER: Bakker MG

PROVIDER: S-EPMC3431371 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Implications of pyrosequencing error correction for biological data interpretation.

Bakker Matthew G MG Tu Zheng J ZJ Bradeen James M JM Kinkel Linda L LL

PloS one 20120830 8

There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to redu ...[more]

PMID: 22952965

Dataset Information

Implications of pyrosequencing error correction for biological data interpretation.

Publications

Implications of pyrosequencing error correction for biological data interpretation.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads.
| S-EPMC4403973 | biostudies-literature

Efficient measurement error correction with spatially misaligned data.
| S-EPMC3169665 | biostudies-literature

PCR-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data.
| S-EPMC3720931 | biostudies-literature

Minimum error correction-based haplotype assembly: Considerations for long read data.
| S-EPMC7292361 | biostudies-literature

Identification and correction of systematic error in high-throughput sequence data.
| S-EPMC3295828 | biostudies-literature

Benchmarking of computational error-correction methods for next-generation sequencing data.
| S-EPMC7079412 | biostudies-literature

Indel and Carryforward Correction (ICC): a new analysis approach for processing 454 pyrosequencing data.
| S-EPMC3777116 | biostudies-literature

Covariate measurement error correction methods in mediation analysis with failure time data.
| S-EPMC4276494 | biostudies-literature

BioMethyl: an R package for biological interpretation of DNA methylation data.
| S-EPMC6761945 | biostudies-literature

Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies.
| S-EPMC2995073 | biostudies-literature