Unknown

Dataset Information

0

Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios.


ABSTRACT:

Background

A long recognized problem is the inference of the supertree S that amalgamates a given set {G(j)} of trees G(j), with leaves in each G(j) being assigned homologous elements. We ground on an approach to find the tree S by minimizing the total cost of mappings ?(j) of individual gene trees G(j) into S. Traditionally, this cost is defined basically as a sum of duplications and gaps in each ?(j). The classical problem is to minimize the total cost, where S runs over the set of all trees that contain an exhaustive non-redundant set of species from all input G(j).

Results

We suggest a reformulation of the classical NP-hard problem of building a supertree in terms of the global minimization of the same cost functional but only over species trees S that consist of clades belonging to a fixed set P (e.g., an exhaustive set of clades in all G(j)). We developed a deterministic solving algorithm with a low degree polynomial (typically cubic) time complexity with respect to the size of input data. We define an extensive set of elementary evolutionary events and suggest an original definition of mapping ? of tree G into tree S. We introduce the cost functional c(G, S, f) and define the mapping ? as the global minimum of this functional with respect to the variable f, in which sense it is a generalization of classical mapping ?. We suggest a reformulation of the classical NP-hard mapping (reconciliation) problem by introducing time slices into the species tree S and present a cubic time solving algorithm to compute the mapping ?. We introduce two novel definitions of the evolutionary scenario based on mapping ? or a random process of gene evolution along a species tree.

Conclusions

Developed algorithms are mathematically proved, which justifies the following statements. The supertree building algorithm finds exactly the global minimum of the total cost if only gene duplications and losses are allowed and the given sets of gene trees satisfies a certain condition. The mapping algorithm finds exactly the minimal mapping ?, the minimal total cost and the evolutionary scenario as a minimum over all possible distributions of elementary evolutionary events along the edges of tree S. The algorithms and their effective software implementations provide useful tools in many biological studies. They facilitate processing of voluminous tree data in acceptable time still largely avoiding heuristics. Performance of the tools is tested with artificial and prokaryotic tree data.

Reviewers

This article was reviewed by Prof. Anthony Almudevar, Prof. Alexander Bolshoy (nominated by Prof. Peter Olofsson), and Prof. Marek Kimmel.

SUBMITTER: Lyubetsky VA 

PROVIDER: S-EPMC3577452 | biostudies-literature | 2012 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios.

Lyubetsky Vassily A VA   Rubanov Lev I LI   Rusin Leonid Y LY   Gorbunov Konstantin Yu KY  

Biology direct 20121222


<h4>Background</h4>A long recognized problem is the inference of the supertree S that amalgamates a given set {G(j)} of trees G(j), with leaves in each G(j) being assigned homologous elements. We ground on an approach to find the tree S by minimizing the total cost of mappings α(j) of individual gene trees G(j) into S. Traditionally, this cost is defined basically as a sum of duplications and gaps in each α(j). The classical problem is to minimize the total cost, where S runs over the set of all  ...[more]

Similar Datasets

| S-EPMC3141660 | biostudies-literature
| S-EPMC3982252 | biostudies-other
| S-EPMC7845980 | biostudies-literature
| S-EPMC149225 | biostudies-literature
| S-EPMC8065373 | biostudies-literature
| S-EPMC2831005 | biostudies-literature
| S-EPMC5853121 | biostudies-literature
| S-EPMC5432190 | biostudies-literature
| S-EPMC1691382 | biostudies-literature
| S-EPMC4780373 | biostudies-literature