Unknown

Dataset Information

0

BPGA- an ultra-fast pan-genome analysis pipeline.


ABSTRACT: Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG &COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains.

SUBMITTER: Chaudhari NM 

PROVIDER: S-EPMC4829868 | biostudies-literature | 2016 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

BPGA- an ultra-fast pan-genome analysis pipeline.

Chaudhari Narendrakumar M NM   Gupta Vinod Kumar VK   Dutta Chitra C  

Scientific reports 20160413


Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing in  ...[more]

Similar Datasets

| S-EPMC5780747 | biostudies-literature
| S-EPMC4363492 | biostudies-literature
| S-EPMC6670167 | biostudies-literature
| S-EPMC4589466 | biostudies-literature
| S-EPMC6964052 | biostudies-literature
| S-EPMC3268234 | biostudies-literature
| S-EPMC7320602 | biostudies-literature
| S-EPMC6635410 | biostudies-literature
2016-12-06 | GSE60865 | GEO
| S-EPMC6041978 | biostudies-literature