Dataset Information

Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.

ABSTRACT: Background: Amplicon sequencing of phylogenetic marker genes, e.g., 16S, 18S, or ITS ribosomal RNA sequences, is still the most commonly used method to determine the composition of microbial communities. Microbial ecologists often have expert knowledge on their biological question and data analysis in general, and most research institutes have computational infrastructures to use the bioinformatics command line tools and workflows for amplicon sequencing analysis, but requirements of bioinformatics skills often limit the efficient and up-to-date use of computational resources.

Results: We present dadasnake, a user-friendly, 1-command Snakemake pipeline that wraps the preprocessing of sequencing reads and the delineation of exact sequence variants by using the favorably benchmarked and widely used DADA2 algorithm with a taxonomic classification and the post-processing of the resultant tables, including hand-off in standard formats. The suitability of the provided default configurations is demonstrated using mock community data from bacteria and archaea, as well as fungi.

Conclusions: By use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. It is easy to install dadasnake via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake.

SUBMITTER: Weißbecker C

PROVIDER: S-EPMC7702218 | biostudies-literature | 2020 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.

Weißbecker Christina C Schnabel Beatrix B Heintz-Buschart Anna A

GigaScience 20201101 12

<h4>Background</h4>Amplicon sequencing of phylogenetic marker genes, e.g., 16S, 18S, or ITS ribosomal RNA sequences, is still the most commonly used method to determine the composition of microbial communities. Microbial ecologists often have expert knowledge on their biological question and data analysis in general, and most research institutes have computational infrastructures to use the bioinformatics command line tools and workflows for amplicon sequencing analysis, but requirements of bioi ...[more]

PMID: 33252655

Similar Datasets

Project description:High-throughput amplicon sequencing is a critical tool for studying microbiota; however, it results only in relative abundance data. Thus, changes in absolute abundance of microbiota cannot be determined, which hinders further microbiology research. We have therefore established a gradient internal standard absolute quantification (GIS-AQ) method to overcome this issue, which can simultaneously obtain the absolute abundances of bacteria and fungi. Deviations from the quantitative equations of microbes and internal standards were eliminated through calibration. Compared with traditional quantitative real-time PCR and microscopy quantifications, this method is reliable (R2average = 0.998; P < 0.001) and accurate (Pinternals versus microscopy > 0.05). The GIS-AQ method can be adapted to any amplicon primer choice (e.g., 336F/806R and ITS3/ITS4), rendering it applicable to ecosystem studies including food, soil, and water samples. Crucially, when using solid-state fermentation samples from various temporal dimensions, the results obtained from the relative and absolute abundance are different. The absolute abundance can be used to study the difference in communities between different samples, and the GIS-AQ method allows this to be done rapidly. Therefore, combining the absolute abundance with relative abundance can accurately reflect the microbiota composition.IMPORTANCE To solve the problem of amplicon sequencing cannot discern the microbiota absolute abundance, we proposed a gradient internal standard absolute quantification method. We used Chinese liquor fermentation as a model system to demonstrate the reliability and accuracy of the method. By comparing the relative and absolute abundances of microbiota in various temporal dimensions, we found dynamic changes in the absolute abundance of communities under various temporal dimensions from the relative abundance. Based on its design principle, this method can be widely applied to different ecosystems. Therefore, we believe that the GIS-AQ method can play an immeasurably useful role in microbiological research.

Project description:Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing. Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems.

Dataset Information

Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.

Publications

Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets