Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees.
Ontology highlight
ABSTRACT: Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960s. In bioinformatics, psychometrics, and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and "generalizability" of these summaries. This article describes the implementation of the geometric distance between trees developed by Billera, Holmes, and Vogtmann (2001) equally applicable to phylogenetic trees and hierarchical clustering trees, and shows some of the applications in evaluating tree estimates. In particular, since Billera et al. (2001) have shown that the space of trees is negatively curved (called a CAT(0) space), a collection of trees can naturally be represented as a tree. We compare this representation to the Euclidean approximations of treespace made available through both a classical multidimensional scaling and a Kernel multidimensional scaling of the matrix of the distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence of both certain columns (positions, variables, or genes) and certain rows (species, observations, or arrays) on the construction of such trees. It also can provide a way of detecting heterogeneous mixtures in the input data. Supplementary materials for this article are available online.
SUBMITTER: Chakerian J
PROVIDER: S-EPMC7518125 | biostudies-literature | 2012
REPOSITORIES: biostudies-literature
ACCESS DATA