Dataset Information

MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data.

ABSTRACT: Next generation sequencing and bioinformatic approaches are increasingly used to quantify microorganisms within populations by analysis of 'meta-barcode' data. This approach relies on comparison of amplicon sequences of 'barcode' regions from a population with public-domain databases of reference sequences. However, for many organisms relevant 'barcode' regions may not have been identified and large databases of reference sequences may not be available. A workflow and software pipeline, 'MetaGaAP,' was developed to identify and quantify genotypes through four steps: shotgun sequencing and identification of polymorphisms in a metapopulation to identify custom 'barcode' regions of less than 30 polymorphisms within the span of a single 'read', amplification and sequencing of the 'barcode', generation of a custom database of polymorphisms, and quantitation of the relative abundance of genotypes. The pipeline and workflow were validated in a 'wild type' Alphabaculovirus isolate, Helicoverpa armigera single nucleopolyhedrovirus (HaSNPV-AC53) and a tissue-culture derived strain (HaSNPV-AC53-T2). The approach was validated by comparison of polymorphisms in amplicons and shotgun data, and by comparison of predicted dominant and co-dominant genotypes with Sanger sequences. The computational power required to generate and search the database effectively limits the number of polymorphisms that can be included in a barcode to 30 or less. The approach can be used in quantitative analysis of the ecology and pathology of non-model organisms.

SUBMITTER: Noune C

PROVIDER: S-EPMC5372007 | biostudies-literature | 2017 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data.

Noune Christopher C Hauxwell Caroline C

Biology 20170217 1

Next generation sequencing and bioinformatic approaches are increasingly used to quantify microorganisms within populations by analysis of 'meta-barcode' data. This approach relies on comparison of amplicon sequences of 'barcode' regions from a population with public-domain databases of reference sequences. However, for many organisms relevant 'barcode' regions may not have been identified and large databases of reference sequences may not be available. A workflow and software pipeline, 'MetaGaA ...[more]

PMID: 28218638

Similar Datasets

Project description:Next-generation sequencing (NGS) and metabarcoding approaches are increasingly applied to wild animal populations, but there is a disconnect between the widely applied generalized linear mixed model (GLMM) approaches commonly used to study phenotypic variation and the statistical toolkit from community ecology typically applied to metabarcoding data. Here, we describe the suitability of a novel GLMM-based approach for analyzing the taxon-specific sequence read counts derived from standard metabarcoding data. This approach allows decomposition of the contribution of different drivers to variation in community composition (e.g., age, season, individual) via interaction terms in the model random-effects structure. We provide guidance to implementing this approach and show how these models can identify how responsible specific taxonomic groups are for the effects attributed to different drivers. We applied this approach to two cross-sectional data sets from the Soay sheep population of St. Kilda. GLMMs showed agreement with dissimilarity-based approaches highlighting the substantial contribution of age and minimal contribution of season to microbiota community compositions, and simultaneously estimated the contribution of other technical and biological factors. We further used model predictions to show that age effects were principally due to increases in taxa of the phylum Bacteroidetes and declines in taxa of the phylum Firmicutes. This approach offers a powerful means for understanding the influence of drivers of community structure derived from metabarcoding data. We discuss how our approach could be readily adapted to allow researchers to estimate contributions of additional factors such as host or microbe phylogeny to answer emerging questions surrounding the ecological and evolutionary roles of within-host communities. IMPORTANCE NGS and fecal metabarcoding methods have provided powerful opportunities to study the wild gut microbiome. A wealth of data is, therefore, amassing across wild systems, generating the need for analytical approaches that can appropriately investigate simultaneous factors at the host and environmental scale that determine the composition of these communities. Here, we describe a generalized linear mixed-effects model (GLMM) approach to analyze read count data from metabarcoding of the gut microbiota, allowing us to quantify the contributions of multiple host and environmental factors to within-host community structure. Our approach provides outputs that are familiar to a majority of field ecologists and can be run using any standard mixed-effects modeling packages. We illustrate this approach using two metabarcoding data sets from the Soay sheep population of St. Kilda investigating age and season effects as worked examples.

Project description:BackgroundDue to the copious disposal of plastics, marine ecosystems receive a large part of this waste. Microplastics (MPs) are solid particles smaller than 5 millimeters in size. Among the plastic polymers, polystyrene (PS) is one of the most commonly used and discarded. Due to its density being greater than that of water, it accumulates in marine sediments, potentially affecting benthic communities. This study investigated the ingestion of MP and their effect on the meiofauna community of a sandy beach. Meiofauna are an important trophic link between the basal and higher trophic levels of sedimentary food webs and may therefore be substantially involved in trophic transfer of MP and their associated compounds.MethodsWe incubated microcosms without addition of MP (controls) and treatments contaminated with PS MP (1-µm) in marine sediments at three nominal concentrations (103, 105, 107particles/mL), for nine days, and sampled for meiofauna with collections every three days. At each sampling time, meiofauna were collected, quantified and identified to higher-taxon level, and ingestion of MP was quantified under an epifluorescence microscope.ResultsExcept for Tardigrada, all meiofauna taxa (Nematoda, turbellarians, Copepoda, Nauplii, Acari and Gastrotricha) ingested MP. Absorption was strongly dose dependent, being highest at 107 particles/mL, very low at 105 particles/mL and non-demonstrable at 103 particles/mL. Nematodes accumulated MP mainly in the intestine; MP abundance in the intestine increased with increasing incubation time. The total meiofauna density and species richness were significantly lower at the lowest MP concentration, while at the highest concentration these parameters were very similar to the control. In contrast, Shannon-Wiener diversity and evenness were greater in treatments with low MP concentration. However, these results should be interpreted with caution because of the low meiofauna abundances at the lower two MP concentrations.ConclusionAt the highest MP concentration, abundance, taxonomic diversity and community structure of a beach meiofauna community were not significantly affected, suggesting that MP effects on meiofauna are at most subtle. However, lower MP concentrations did cause substantial declines in abundance and diversity, in line with previous studies at the population and community level. While we can only speculate on the underlying mechanism(s) of this counterintuitive response, results suggest that further research is needed to better understand MP effects on marine benthic communities.

Dataset Information

MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data.

Publications

MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets