Dataset Information

MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.

ABSTRACT:

Background

Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses.

Methodology

Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets.

Conclusions

Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets.

SUBMITTER: Ropelewski AJ

PROVIDER: S-EPMC2981553 | biostudies-literature | 2010 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.

Ropelewski Alexander J AJ Nicholas Hugh B HB Gonzalez Mendez Ricardo R RR

PloS one 20101115 11

<h4>Background</h4>Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses.<h4>Methodology</h4>Our focus was to increase t ...[more]

PMID: 21085574

Dataset Information

MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.

Background

Methodology

Conclusions

Publications

MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Genome-scale phylogenetic function annotation of large and diverse protein families.
| S-EPMC3205580 | biostudies-literature

AlignmentViewer: Sequence Analysis of Large Protein Families.
| S-EPMC7570326 | biostudies-literature

Phylogenetic analysis of the insulin-like growth factor binding protein (IGFBP) and IGFBP-related protein gene families.
| S-EPMC2241922 | biostudies-literature

learnMSA: learning and aligning large protein families.
| S-EPMC9673500 | biostudies-literature

Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG.
| S-EPMC9502163 | biostudies-literature

Phylogenetic analysis of the MS4A and TMEM176 gene families.
| S-EPMC2826416 | biostudies-literature

A computationally driven analysis of the polyphenol-protein interactome.
| S-EPMC5797150 | biostudies-literature

An efficient algorithm for large-scale detection of protein families.
| S-EPMC101833 | biostudies-literature

Large language models generate functional protein sequences across diverse families.
| S-EPMC10400306 | biostudies-literature

Advancing Intact Protein Quantitation with Updated Deconvolution Routines.
| S-EPMC10840078 | biostudies-literature