Dataset Information

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data.

ABSTRACT: BACKGROUND:The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants-DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam ( https://github.com/benjjneb/decontam ), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls. RESULTS:Decontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome and that some low-frequency taxa seemingly associated with preterm birth were contaminants. CONCLUSIONS:Decontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. Decontam integrates easily with existing MGS workflows and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost.

SUBMITTER: Davis NM

PROVIDER: S-EPMC6298009 | biostudies-literature | 2018 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data.

Davis Nicole M NM Proctor Diana M DM Holmes Susan P SP Relman David A DA Callahan Benjamin J BJ

Microbiome 20181217 1

<h4>Background</h4>The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants-DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam ( https://github.com/benjjneb/decontam ), an open-source R package that implements a statistical classification procedure t ...[more]

PMID: 30558668

Similar Datasets

Project description:Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth's deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.

Project description:Two mutants of winter rapeseed (Brassica napus L. var. oleifera) with an increased amount of oleic acid in seeds were created by chemical mutagenesis (HOR3-M10453 and HOR4-M10464). The overall performance of the mutated plants was much lower than that of wild-type cultivars. Multiple rounds of crossing with high-yielding double-low ("00") cultivars and breeding lines having valuable agronomic traits, followed by selection of high oleic acid genotypes is then needed to obtain new "00" varieties of rapeseed having high oleic acid content in seeds. To perform such selection, the specific codominant cleaved amplified polymorphic sequences (CAPS) marker was used. This marker was designed to detect the presence of two relevant point mutations in the desaturase gene BnaA.FAD2, and it was previously described and patented. The specific polymerase chain reaction product (732 bp) was digested using FspBI restriction enzyme that recognizes the 5'-C↓TAG-3' sequence which is common to both mutated alleles, thereby yielding band patterns specific for those alleles. The method proposed in the patent was redesigned, adjusted to specific laboratory conditions, and thoroughly tested. Different DNA extraction protocols were tested to optimize the procedure. Two variants of the CAPS method (with and without purification of amplified product) were considered to choose the best option. In addition, the ability of the studied marker to detect heterozygosity in the BnaA.FAD2 locus was also tested. Finally, we also presented some examples for the use of the new CAPS marker in the marker-assisted selection (MAS) during our breeding programs. The standard CTAB method of DNA extraction and the simplified, two-step (amplification/digestion) procedure for the CAPS marker are recommended. The marker was found to be useful for the detection of two mutated alleles of the studied BnaA.FAD2 desaturase gene and can potentially assure the breeders of the purity of their HOLL lines. However, it was also shown that it could not detect any other alleles or genes that were revealed to play a role in the regulation of oleic acid level.

Dataset Information

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data.

Publications

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets