Dataset Information

A new family of dissimilarity metrics for discrete character matrices that include inapplicable characters and its importance for disparity studies.

ABSTRACT: The use of discrete character data for disparity analyses has become more popular, partially due to the recognition that character data describe variation at large taxonomic scales, as well as the increasing availability of both character matrices co-opted from phylogenetic analysis and software tools. As taxonomic scope increases, the need to describe variation leads to some characters that may describe traits not found across all the taxa. In such situations, it is common practice to treat inapplicable characters as missing data when calculating dissimilarity matrices for disparity studies. For commonly used dissimilarity metrics like Wills's GED and Gower's coefficient, this can lead to the reranking of pairwise dissimilarities, resulting in taxa that share more primary character states being assigned larger dissimilarity values than taxa that share fewer. We introduce a family of metrics that proportionally weight primary characters according to the secondary characters that describe them, effectively eliminating this problem, and compare their performance to common dissimilarity metrics and previously proposed weighting schemes. When applied to empirical datasets, we confirm that choice of dissimilarity metric frequently affects the rank order of pairwise distances, differentially influencing downstream macroevolutionary inferences.

SUBMITTER: Hopkins MJ

PROVIDER: S-EPMC6283942 | biostudies-literature | 2018 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A new family of dissimilarity metrics for discrete character matrices that include inapplicable characters and its importance for disparity studies.

Hopkins Melanie J MJ St John Katherine K

Proceedings. Biological sciences 20181128 1892

The use of discrete character data for disparity analyses has become more popular, partially due to the recognition that character data describe variation at large taxonomic scales, as well as the increasing availability of both character matrices co-opted from phylogenetic analysis and software tools. As taxonomic scope increases, the need to describe variation leads to some characters that may describe traits not found across all the taxa. In such situations, it is common practice to treat ina ...[more]

PMID: 30487309

Dataset Information

A new family of dissimilarity metrics for discrete character matrices that include inapplicable characters and its importance for disparity studies.

Publications

A new family of dissimilarity metrics for discrete character matrices that include inapplicable characters and its importance for disparity studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Discrete and continuous character-based disparity analyses converge to the same macroevolutionary signal: a case study from captorhinids.
| S-EPMC5727480 | biostudies-literature

Shifting spaces: Which disparity or dissimilarity measurement best summarize occupancy in multidimensional spaces?
| S-EPMC7391566 | biostudies-literature

Resolving network clusters disparity based on dissimilarity measurements with nonmetric analysis of variance.
| S-EPMC10663764 | biostudies-literature

Early bursts of disparity and the reorganization of character integration.
| S-EPMC6253373 | biostudies-literature

Sensitivity of discrete symmetry metrics: Implications for metric choice.
| S-EPMC9119531 | biostudies-literature

The impact of fossil data on annelid phylogeny inferred from discrete morphological characters.
| S-EPMC5013799 | biostudies-literature

A Parallel Framework with Block Matrices of a Discrete Fourier Transform for Vector-Valued Discrete-Time Signals.
| S-EPMC4587485 | biostudies-other

Taxon incompleteness and discrete time bins affect character change rates in simulated data.
| S-EPMC7728684 | biostudies-literature

DiscML: an R package for estimating evolutionary rates of discrete characters using maximum likelihood.
| S-EPMC4261585 | biostudies-literature

A family of functional dissimilarity measures for presence and absence data.
| S-EPMC4984511 | biostudies-literature