Unknown

Dataset Information

0

Pan4Draft: A Computational Tool to Improve the Accuracy of Pan-Genomic Analysis Using Draft Genomes.


ABSTRACT: High-throughput sequencing technologies are a milestone in molecular biology for facilitating great advances in genomics by enabling the deposit of large volumes of biological data to public databases. The availability of such data has made possible the comparative genomic analysis through pipelines, using the entire gene repertoire of genomes. However, a large number of unfinished genomes exist in public databases; their number is approximately 16-fold higher than the number of complete genomes, which creates bias during comparative analyses. Therefore, the present work proposes a new tool called Pan4Drafts, an automated pipeline for pan-genomic analysis of draft prokaryotic genomes to maximize the representation and accuracy of the gene repertoire of unfinished genomes by using reads from sequencing data. Pan4Draft allows to perform comparative analyses using different methodologies such as combining complete and draft genomes, using only draft genomes or only complete genomes. Pan4Draft is available at http://www.computationalbiology.ufpa.br/pan4drafts and the test dataset is available at https://sourceforge.net/projects/pan4drafts .

SUBMITTER: Veras A 

PROVIDER: S-EPMC6018222 | biostudies-literature | 2018 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Pan4Draft: A Computational Tool to Improve the Accuracy of Pan-Genomic Analysis Using Draft Genomes.

Veras Allan A   Araujo Fabricio F   Pinheiro Kenny K   Guimarães Luis L   Azevedo Vasco V   Soares Siomar S   da Costa da Silva Artur A   Ramos Rommel R  

Scientific reports 20180625 1


High-throughput sequencing technologies are a milestone in molecular biology for facilitating great advances in genomics by enabling the deposit of large volumes of biological data to public databases. The availability of such data has made possible the comparative genomic analysis through pipelines, using the entire gene repertoire of genomes. However, a large number of unfinished genomes exist in public databases; their number is approximately 16-fold higher than the number of complete genomes  ...[more]

Similar Datasets

| S-EPMC9113259 | biostudies-literature
| S-EPMC2949892 | biostudies-literature
| S-EPMC5860061 | biostudies-literature
| S-EPMC3268234 | biostudies-literature
| S-EPMC7025898 | biostudies-literature
| S-EPMC2483986 | biostudies-literature
| S-EPMC151282 | biostudies-literature
| S-EPMC6902338 | biostudies-literature
| S-EPMC3001109 | biostudies-other
| S-EPMC3172244 | biostudies-literature