Unknown

Dataset Information

0

MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data.


ABSTRACT: Next generation sequencing and bioinformatic approaches are increasingly used to quantify microorganisms within populations by analysis of 'meta-barcode' data. This approach relies on comparison of amplicon sequences of 'barcode' regions from a population with public-domain databases of reference sequences. However, for many organisms relevant 'barcode' regions may not have been identified and large databases of reference sequences may not be available. A workflow and software pipeline, 'MetaGaAP,' was developed to identify and quantify genotypes through four steps: shotgun sequencing and identification of polymorphisms in a metapopulation to identify custom 'barcode' regions of less than 30 polymorphisms within the span of a single 'read', amplification and sequencing of the 'barcode', generation of a custom database of polymorphisms, and quantitation of the relative abundance of genotypes. The pipeline and workflow were validated in a 'wild type' Alphabaculovirus isolate, Helicoverpa armigera single nucleopolyhedrovirus (HaSNPV-AC53) and a tissue-culture derived strain (HaSNPV-AC53-T2). The approach was validated by comparison of polymorphisms in amplicons and shotgun data, and by comparison of predicted dominant and co-dominant genotypes with Sanger sequences. The computational power required to generate and search the database effectively limits the number of polymorphisms that can be included in a barcode to 30 or less. The approach can be used in quantitative analysis of the ecology and pathology of non-model organisms.

SUBMITTER: Noune C 

PROVIDER: S-EPMC5372007 | biostudies-literature | 2017 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data.

Noune Christopher C   Hauxwell Caroline C  

Biology 20170217 1


Next generation sequencing and bioinformatic approaches are increasingly used to quantify microorganisms within populations by analysis of 'meta-barcode' data. This approach relies on comparison of amplicon sequences of 'barcode' regions from a population with public-domain databases of reference sequences. However, for many organisms relevant 'barcode' regions may not have been identified and large databases of reference sequences may not be available. A workflow and software pipeline, 'MetaGaA  ...[more]

Similar Datasets

| S-EPMC4222489 | biostudies-literature
| S-EPMC6775653 | biostudies-literature
| S-EPMC3272009 | biostudies-literature
| S-EPMC5984581 | biostudies-literature
| S-EPMC4931447 | biostudies-literature
| S-EPMC5850866 | biostudies-other
| S-EPMC310497 | biostudies-literature
| S-EPMC5768174 | biostudies-literature
| S-EPMC8661426 | biostudies-literature
| S-EPMC4368519 | biostudies-other