Unknown

Dataset Information

0

Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data.


ABSTRACT:

Background

One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity.

Results

We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques.

Conclusions

Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html.

SUBMITTER: Sun G 

PROVIDER: S-EPMC4253613 | biostudies-literature | 2014 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data.

Sun Guoli G   Krasnitz Alexander A  

BMC genomics 20141119


<h4>Background</h4>One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity.<h4>Results</h4>We formulate a method termed Tree Branches Evaluated Statistically for Tightness  ...[more]

Similar Datasets

| S-EPMC4000433 | biostudies-literature
| S-EPMC7615108 | biostudies-literature
| S-EPMC11249598 | biostudies-literature
| S-EPMC5547491 | biostudies-literature
| S-EPMC7278111 | biostudies-literature
| S-EPMC10334412 | biostudies-literature
| S-EPMC3218220 | biostudies-literature
| S-EPMC4608541 | biostudies-literature
| S-EPMC3114728 | biostudies-literature
| S-EPMC9116704 | biostudies-literature