Dataset Information

Assessing the consequences of denoising marker-based metagenomic data.

ABSTRACT: Early marker-based metagenomic studies were performed without properly accounting for the effects of noise (sequencing errors, PCR single-base errors, and PCR chimeras). Denoising algorithms have been developed, but they were validated using data derived from mock communities, in which the true sequences were known. Since the algorithms were designed to be used in real community studies, it is important to evaluate the results in such cases. With this goal in mind, we processed a real 16S rRNA metagenomic dataset through five denoising pipelines. By reconstituting the sequence reads at each stage of the pipelines, we determined how the reads were being altered. In one denoising pipeline, AmpliconNoise, we found that the algorithm that was designed to remove pyrosequencing errors changed the reads in a manner inconsistent with the known spectrum of these errors, until one of the parameters was increased substantially from its default value. Additionally, because the longest read was picked as the representative for each cluster, sequences were added to the 3' ends of shorter reads that were often dissimilar from what had been removed by the truncations of the previous filtering step. In QIIME, the denoising algorithm caused a much larger number of changes to the reads unless the parameters were changed from their defaults. The denoising pipeline in mothur avoided some of these negative side-effects because of its strict default filtering criteria, but these criteria also greatly limited the sequence information produced at the end of the pipeline. We recommend that those using these denoising pipelines be cognizant of these issues and examine how their reads are being transformed by the denoising process as a component of their analysis.

SUBMITTER: Gaspar JM

PROVIDER: S-EPMC3607570 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Assessing the consequences of denoising marker-based metagenomic data.

Gaspar John M JM Thomas W Kelley WK

PloS one 20130325 3

Early marker-based metagenomic studies were performed without properly accounting for the effects of noise (sequencing errors, PCR single-base errors, and PCR chimeras). Denoising algorithms have been developed, but they were validated using data derived from mock communities, in which the true sequences were known. Since the algorithms were designed to be used in real community studies, it is important to evaluate the results in such cases. With this goal in mind, we processed a real 16S rRNA m ...[more]

PMID: 23536909

Similar Datasets

Project description:Different strategies have been developed using Independent Component Analysis (ICA) to automatically de-noise fMRI data, either focusing on removing only certain components (e.g. motion-ICA-AROMA, Pruim et al., 2015a) or using more complex classifiers to remove multiple types of noise components (e.g. FIX, Salimi-Khorshidi et al., 2014 Griffanti et al., 2014). However, denoising data obtained in an acute setting might prove challenging: the presence of multiple noise sources may not allow focused strategies to clean the data enough and the heterogeneity in the data may be so great to critically undermine complex approaches. The purpose of this study was to explore what automated ICA based approach would better cope with these limitations when cleaning fMRI data obtained from acute stroke patients. The performance of a focused classifier (ICA-AROMA) and a complex classifier (FIX) approaches were compared using data obtained from twenty consecutive acute lacunar stroke patients using metrics determining RSN identification, RSN reproducibility, changes in the BOLD variance, differences in the estimation of functional connectivity and loss of temporal degrees of freedom. The use of generic-trained FIX resulted in misclassification of components and significant loss of signal (< 80%), and was not explored further. Both ICA-AROMA and patient-trained FIX based denoising approaches resulted in significantly improved RSN reproducibility (p < 0.001), localized reduction in BOLD variance consistent with noise removal, and significant changes in functional connectivity (p < 0.001). Patient-trained FIX resulted in higher RSN identifiability (p < 0.001) and wider changes both in the BOLD variance and in functional connectivity compared to ICA-AROMA. The success of ICA-AROMA suggests that by focusing on selected components the full automation can deliver meaningful data for analysis even in population with multiple sources of noise. However, the time invested to train FIX with appropriate patient data proved valuable, particularly in improving the signal-to-noise ratio.

Dataset Information

Assessing the consequences of denoising marker-based metagenomic data.

Publications

Assessing the consequences of denoising marker-based metagenomic data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets