Unknown

Dataset Information

0

Estimating Bayesian Phylogenetic Information Content.


ABSTRACT: Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data. [Bayesian; concatenation; conditional clade distribution; entropy; information; phylogenetics; saturation.].

SUBMITTER: Lewis PO 

PROVIDER: S-EPMC5066063 | biostudies-literature | 2016 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Estimating Bayesian Phylogenetic Information Content.

Lewis Paul O PO   Chen Ming-Hui MH   Kuo Lynn L   Lewis Louise A LA   Fučíková Karolina K   Neupane Suman S   Wang Yu-Bo YB   Shi Daoyuan D  

Systematic biology 20160506 6


Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages c  ...[more]

Similar Datasets

| S-EPMC10603099 | biostudies-literature
| S-EPMC5010905 | biostudies-literature
| S-EPMC6156416 | biostudies-literature
| S-EPMC8782526 | biostudies-literature
| S-EPMC4151598 | biostudies-literature
2021-01-30 | GSE165812 | GEO
| S-EPMC3668646 | biostudies-other
| S-EPMC5544302 | biostudies-other
| S-EPMC2040160 | biostudies-literature
| S-EPMC3953559 | biostudies-other