Unknown

Dataset Information

0

ParGenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes.


ABSTRACT:

Motivation

Coalescent- and reconciliation-based methods are now widely used to infer species phylogenies from genomic data. They typically use per-gene phylogenies as input, which requires conducting multiple individual tree inferences on a large set of multiple sequence alignments (MSAs). At present, no easy-to-use parallel tool for this task exists. Ad hoc scripts for this purpose do not only induce additional implementation overhead, but can also lead to poor resource utilization and long times-to-solution. We present ParGenes, a tool for simultaneously determining the best-fit model and inferring maximum likelihood (ML) phylogenies on thousands of independent MSAs using supercomputers.

Results

ParGenes executes common phylogenetic pipeline steps such as model-testing, ML inference(s), bootstrapping and computation of branch support values via a single parallel program invocation. We evaluated ParGenes by inferring > 20 000 phylogenetic gene trees with bootstrap support values from Ensembl Compara and VectorBase alignments in 28?h on a cluster with 1024 nodes.

Availability and implementation

GNU GPL at https://github.com/BenoitMorel/ParGenes.

Supplementary information

Supplementary material is available at Bioinformatics online.

SUBMITTER: Morel B 

PROVIDER: S-EPMC6513153 | biostudies-literature | 2019 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

ParGenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes.

Morel Benoit B   Kozlov Alexey M AM   Stamatakis Alexandros A  

Bioinformatics (Oxford, England) 20190501 10


<h4>Motivation</h4>Coalescent- and reconciliation-based methods are now widely used to infer species phylogenies from genomic data. They typically use per-gene phylogenies as input, which requires conducting multiple individual tree inferences on a large set of multiple sequence alignments (MSAs). At present, no easy-to-use parallel tool for this task exists. Ad hoc scripts for this purpose do not only induce additional implementation overhead, but can also lead to poor resource utilization and  ...[more]

Similar Datasets

| S-EPMC4166930 | biostudies-literature
| S-EPMC4737934 | biostudies-literature
| S-EPMC8357345 | biostudies-literature
| S-EPMC8042768 | biostudies-literature
| S-EPMC8300927 | biostudies-literature
| S-EPMC3637801 | biostudies-other
| S-EPMC3597552 | biostudies-literature
| S-EPMC3201883 | biostudies-literature
2011-08-01 | E-GEOD-26826 | biostudies-arrayexpress
| S-EPMC10182853 | biostudies-literature