Unknown

Dataset Information

0

Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees.


ABSTRACT: Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960s. In bioinformatics, psychometrics, and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and "generalizability" of these summaries. This article describes the implementation of the geometric distance between trees developed by Billera, Holmes, and Vogtmann (2001) equally applicable to phylogenetic trees and hierarchical clustering trees, and shows some of the applications in evaluating tree estimates. In particular, since Billera et al. (2001) have shown that the space of trees is negatively curved (called a CAT(0) space), a collection of trees can naturally be represented as a tree. We compare this representation to the Euclidean approximations of treespace made available through both a classical multidimensional scaling and a Kernel multidimensional scaling of the matrix of the distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence of both certain columns (positions, variables, or genes) and certain rows (species, observations, or arrays) on the construction of such trees. It also can provide a way of detecting heterogeneous mixtures in the input data. Supplementary materials for this article are available online.

SUBMITTER: Chakerian J 

PROVIDER: S-EPMC7518125 | biostudies-literature | 2012

REPOSITORIES: biostudies-literature

altmetric image

Publications

Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees.

Chakerian John J   Holmes Susan S  

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America 20120816 3


Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960s. In bioinformatics, psychometrics, and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and "generalizability" of these summaries. This article describes the implementation of the geometric distance between trees developed by Billera, Holme  ...[more]

Similar Datasets

| S-EPMC6705769 | biostudies-literature
| S-EPMC6057528 | biostudies-literature
| S-EPMC4817050 | biostudies-literature
| S-EPMC4529085 | biostudies-literature
2022-11-19 | E-MTAB-8173 | biostudies-arrayexpress
| S-EPMC3813836 | biostudies-other
| S-EPMC5432190 | biostudies-literature
| S-EPMC3669789 | biostudies-other
| S-EPMC5447242 | biostudies-literature