Project description:Most experiments studying bacterial microbiomes rely on the PCR amplification of all or part of the gene for the 16S rRNA subunit, which serves as a biomarker for identifying and quantifying the various taxa present in a microbiome sample. Several computational methods exist for analyzing 16S amplicon sequencing. However, the most-used bioinformatics tools cannot produce high quality genus-level or species-level taxonomic calls and may underestimate the potential accuracy of these calls. We used 16S sequencing data from mock bacterial communities to evaluate the sensitivity and specificity of several bioinformatics pipelines and genomic reference libraries used for microbiome analyses, concentrating on measuring the accuracy of species-level taxonomic assignments of 16S amplicon reads. We evaluated the tools DADA2, QIIME 2, Mothur, PathoScope 2, and Kraken 2 in conjunction with reference libraries from Greengenes, SILVA, Kraken 2, and RefSeq. Profiling tools were compared using publicly available mock community data from several sources, comprising 136 samples with varied species richness and evenness, several different amplified regions within the 16S rRNA gene, and both DNA spike-ins and cDNA from collections of plated cells. PathoScope 2 and Kraken 2, both tools designed for whole-genome metagenomics, outperformed DADA2, QIIME 2 using the DADA2 plugin, and Mothur, which are theoretically specialized for 16S analyses. Evaluations of reference libraries identified the SILVA and RefSeq/Kraken 2 Standard libraries as superior in accuracy compared to Greengenes. These findings support PathoScope and Kraken 2 as fully capable, competitive options for genus- and species-level 16S amplicon sequencing data analysis, whole genome sequencing, and metagenomics data tools.
Project description:Amplicon high-throughput sequencing of 16S ribosomal RNA (rRNA) gene is currently the most widely used technique to investigate complex gut microbial communities. Microbial identification might be influenced by several factors, including the choice of bioinformatic pipelines, making comparisons across studies difficult. Here, we compared four commonly used pipelines (QIIME2, Bioconductor, UPARSE and mothur) run on two operating systems (OS) (Linux and Mac), to evaluate the impact of bioinformatic pipeline and OS on the taxonomic classification of 40 human stool samples. We applied the SILVA 132 reference database for all the pipelines. We compared phyla and genera identification and relative abundances across the four pipelines using the Friedman rank sum test. QIIME2 and Bioconductor provided identical outputs on Linux and Mac OS, while UPARSE and mothur reported only minimal differences between OS. Taxa assignments were consistent at both phylum and genus level across all the pipelines. However, a difference in terms of relative abundance was identified for all phyla (p < 0.013) and for the majority of the most abundant genera (p < 0.028), such as Bacteroides (QIIME2: 24.5%, Bioconductor: 24.6%, UPARSE-linux: 23.6%, UPARSE-mac: 20.6%, mothur-linux: 22.2%, mothur-mac: 21.6%, p < 0.001). The use of different bioinformatic pipelines affects the estimation of the relative abundance of gut microbial community, indicating that studies using different pipelines cannot be directly compared. A harmonization procedure is needed to move the field forward.
Project description:Next generation sequencing technology has revolutionised microbiology by allowing concurrent analysis of whole microbial communities. Here we developed and verified similar methods for the analysis of fungal communities using a proton release sequencing platform with the ability to sequence reads of up to 400 bp in length at significant depth. This read length permits the sequencing of amplicons from commonly used fungal identification regions and thereby taxonomic classification. Using the 400 bp sequencing capability, we have sequenced amplicons from the ITS1, ITS2 and LSU fungal regions to a depth of approximately 700,000 raw reads per sample. Representative operational taxonomic units (OTUs) were chosen by the USEARCH algorithm, and identified taxonomically through nucleotide blast (BLASTn). Combination of this sequencing technology with the bioinformatics pipeline allowed species recognition in two controlled fungal spore populations containing members of known identity and concentration. Each species included within the two controlled populations was found to correspond to a representative OTU, and these OTUs were found to be highly accurate representations of true biological sequences. However, the absolute number of reads attributed to each OTU differed among species. The majority of species were represented by an OTU derived from all three genomic regions although in some cases, species were only represented in two of the regions due to the absence of conserved primer binding sites or due to sequence composition. It is apparent from our data that proton release sequencing technologies can deliver a qualitative assessment of the fungal members comprising a sample. The fact that some fungi cannot be amplified by specific "conserved" primer pairs confirms our recommendation that a multi-region approach be taken for other amplicon-based metagenomic studies.
Project description:Microbiome research relies on next-generation sequencing and on downstream data analysis workflows. Several manufacturers have introduced multi-amplicon kits for microbiome characterization, improving speciation, but present unique challenges for analysis. The goal of this methodology study was to develop two analysis pipelines specific to mixed-orientation reads from multi-hypervariable (V) region amplicons. A secondary aim was to assess agreement with expected abundance, considering database and variable region. Mock community sequence data (n = 41) generated using the Ion16S™ Metagenomics Kit and Ion Torrent Sequencing Platform were analyzed using two workflows. Amplicons from V2, V3, V4, V6-7, V8 and V9 were deconvoluted using a specialized plugin based on CutPrimers. A separate workflow using Cutadapt is also presented. Three reference databases (Ribosomal Database Project, Greengenes and Silva) were used for taxonomic assignment. Bray-Curtis, Euclidean and Jensen-Shannon distance measures were used to evaluate overall annotation consistency, and specific taxon agreement was determined by calculating the ratio of observed to expected relative abundance. Reads that mapped to regions V2-V9 varied for both CutPrimers and Cutadapt-based methods. Within the CutPrimers-based pipeline, V3 amplicons had the best agreement with the expected distribution, tested using global distance measures, while V9 amplicons had the worst agreement. Accurate taxonomic annotation varied by genus-level taxon and V region analyzed. For the first time, we present a microbiome analysis pipeline that employs a specialized plugin to allow microbiome researchers to separate multi-amplicon data from the Ion16S Metagenomics Kit into V-specific reads. We also present an additional analysis workflow, modified for Ion Torrent mixed orientation reads. Overall, the global agreement of amplicons with the expected mock community abundances differed across V regions and reference databases. Benchmarking data should be referenced when planning a microbiome study to consider these biases related to sequencing and data analysis for multi-amplicon sequencing kits.
Project description:IntroductionMetagenomic sequencing is increasingly being used in clinical settings for difficult to diagnose cases. The performance of viral metagenomic protocols relies to a large extent on the bioinformatic analysis. In this study, the European Society for Clinical Virology (ESCV) Network on NGS (ENNGS) initiated a benchmark of metagenomic pipelines currently used in clinical virological laboratories.MethodsMetagenomic datasets from 13 clinical samples from patients with encephalitis or viral respiratory infections characterized by PCR were selected. The datasets were analyzed with 13 different pipelines currently used in virological diagnostic laboratories of participating ENNGS members. The pipelines and classification tools were: Centrifuge, DAMIAN, DIAMOND, DNASTAR, FEVIR, Genome Detective, Jovian, MetaMIC, MetaMix, One Codex, RIEMS, VirMet, and Taxonomer. Performance, characteristics, clinical use, and user-friendliness of these pipelines were analyzed.ResultsOverall, viral pathogens with high loads were detected by all the evaluated metagenomic pipelines. In contrast, lower abundance pathogens and mixed infections were only detected by 3/13 pipelines, namely DNASTAR, FEVIR, and MetaMix. Overall sensitivity ranged from 80% (10/13) to 100% (13/13 datasets). Overall positive predictive value ranged from 71-100%. The majority of the pipelines classified sequences based on nucleotide similarity (8/13), only a minority used amino acid similarity, and 6 of the 13 pipelines assembled sequences de novo. No clear differences in performance were detected that correlated with these classification approaches. Read counts of target viruses varied between the pipelines over a range of 2-3 log, indicating differences in limit of detection.ConclusionA wide variety of viral metagenomic pipelines is currently used in the participating clinical diagnostic laboratories. Detection of low abundant viral pathogens and mixed infections remains a challenge, implicating the need for standardization and validation of metagenomic analysis for clinical diagnostic use. Future studies should address the selective effects due to the choice of different reference viral databases.
Project description:Shotgun metagenomic sequencing comprehensively samples the DNA of a microbial sample. Choosing the best bioinformatics processing package can be daunting due to the wide variety of tools available. Here, we assessed publicly available shotgun metagenomics processing packages/pipelines including bioBakery, Just a Microbiology System (JAMS), Whole metaGenome Sequence Assembly V2 (WGSA2), and Woltka using 19 publicly available mock community samples and a set of five constructed pathogenic gut microbiome samples. Also included is a workflow for labelling bacterial scientific names with NCBI taxonomy identifiers for better resolution in assessing results. The Aitchison distance, a sensitivity metric, and total False Positive Relative Abundance were used for accuracy assessments for all pipelines and mock samples. Overall, bioBakery4 performed the best with most of the accuracy metrics, while JAMS and WGSA2, had the highest sensitivities. Furthermore, bioBakery is commonly used and only requires a basic knowledge of command line usage. This work provides an unbiased assessment of shotgun metagenomics packages and presents results assessing the performance of the packages using mock community sequence data.
Project description:The rapid development of sequencing technologies has provided access to environments that were either once thought inhospitable to life altogether or that contain too few cells to be analyzed using genomics approaches. While 16S rRNA gene microbial community sequencing has revolutionized our understanding of community composition and diversity over time and space, it only provides a crude estimate of microbial functional and metabolic potential. Alternatively, shotgun metagenomics allows comprehensive sampling of all genetic material in an environment, without any underlying primer biases. Until recently, one of the major bottlenecks of shotgun metagenomics has been the requirement for large initial DNA template quantities during library preparation.Here, we investigate the effects of varying template concentrations across three low biomass library preparation protocols on their ability to accurately reconstruct a mock microbial community of known composition. We analyze the effects of input DNA quantity and library preparation method on library insert size, GC content, community composition, assembly quality and metagenomic binning. We found that library preparation method and the amount of starting material had significant impacts on the mock community metagenomes. In particular, GC content shifted towards more GC rich sequences at the lower input quantities regardless of library prep method, the number of low quality reads that could not be mapped to the reference genomes increased with decreasing input quantities, and the different library preparation methods had an impact on overall metagenomic community composition.This benchmark study provides recommendations for library creation of representative and minimally biased metagenome shotgun sequencing, enabling insights into functional attributes of low biomass ecosystem microbial communities.
Project description:Microbial amplicon sequencing studies are an important tool in biological and biomedical research. Widespread 16S rRNA gene microbial surveys have shed light on the structure of many ecosystems inhabited by bacteria, including the human body. However, specialized software and algorithms are needed to convert raw sequencing data into biologically meaningful information (i.e. tables of bacterial counts). While different bioinformatic pipelines are available in a rapidly changing and improving field, users are often unaware of limitations and biases associated with individual pipelines and there is a lack of agreement regarding best practices. Here, we compared six bioinformatic pipelines for the analysis of amplicon sequence data: three OTU-level flows (QIIME-uclust, MOTHUR, and USEARCH-UPARSE) and three ASV-level (DADA2, Qiime2-Deblur, and USEARCH-UNOISE3). We tested workflows with different quality control options, clustering algorithms, and cutoff parameters on a mock community as well as on a large (N = 2170) recently published fecal sample dataset from the multi-ethnic HELIUS study. We assessed the sensitivity, specificity, and degree of consensus of the different outputs. DADA2 offered the best sensitivity, at the expense of decreased specificity compared to USEARCH-UNOISE3 and Qiime2-Deblur. USEARCH-UNOISE3 showed the best balance between resolution and specificity. OTU-level USEARCH-UPARSE and MOTHUR performed well, but with lower specificity than ASV-level pipelines. QIIME-uclust produced large number of spurious OTUs as well as inflated alpha-diversity measures and should be avoided in future studies. This study provides guidance for researchers using amplicon sequencing to gain biological insights.