Dataset Information

Comparing distributions of color words: pitfalls and metric choices.

ABSTRACT: Computational methods have started playing a significant role in semantic analysis. One particularly accessible area for developing good computational methods for linguistic semantics is in color naming, where perceptual dissimilarity measures provide a geometric setting for the analyses. This setting has been studied first by Berlin & Kay in 1969, and then later on by a large data collection effort: the World Color Survey (WCS). From the WCS, a dataset on color naming by 2 616 speakers of 110 different languages is made available for further research. In the analysis of color naming from WCS, however, the choice of analysis method is an important factor of the analysis. We demonstrate concrete problems with the choice of metrics made in recent analyses of WCS data, and offer approaches for dealing with the problems we can identify. Picking a metric for the space of color naming distributions that ignores perceptual distances between colors assumes a decorrelated system, where strong spatial correlations in fact exist. We can demonstrate that the corresponding issues are significantly improved when using Earth Mover's Distance, or Quadratic [Formula: see text]-square Distance, and we can approximate these solutions with a kernel-based analysis method.

SUBMITTER: Vejdemo-Johansson M

PROVIDER: S-EPMC3934892 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Comparing distributions of color words: pitfalls and metric choices.

Vejdemo-Johansson Mikael M Vejdemo Susanne S Ek Carl-Henrik CH

PloS one 20140225 2

Computational methods have started playing a significant role in semantic analysis. One particularly accessible area for developing good computational methods for linguistic semantics is in color naming, where perceptual dissimilarity measures provide a geometric setting for the analyses. This setting has been studied first by Berlin & Kay in 1969, and then later on by a large data collection effort: the World Color Survey (WCS). From the WCS, a dataset on color naming by 2 616 speakers of 110 d ...[more]

PMID: 24586580

Similar Datasets

Project description:The system for colorimetry adopted by the Commission Internationale de l'Eclairage (CIE) in 1931, along with its subsequent improvements, represents a family of light mixture models that has served well for many decades for stimulus specification and reproduction when highly controlled color standards are important. Still, with regard to color appearance many perceptual and cognitive factors are known to contribute to color similarity, and, in general, to all cognitive judgments of color. Using experimentally obtained odd-one-out triad similarity judgments from 52 observers, we demonstrate that CIE-based models can explain a good portion (but not all) of the color similarity data. Color difference quantified by CIELAB ?E explained behavior at levels of 81% (across all colors), 79% (across red colors), and 66% (across blue colors). We show that the unexplained variation cannot be ascribed to inter- or intra-individual variations among the observers, and points to the presence of additional factors shared by the majority of responders. Based on this, we create a quantitative model of a lexicographic semiorder type, which shows how different perceptual and cognitive influences can trade-off when making color similarity judgments. We show that by incorporating additional influences related to categorical and lightness and saturation factors, the model explains more of the triad similarity behavior, namely, 91% (all colors), 90% (reds), and 87% (blues). We conclude that distance in a CIE model is but the first of several layers in a hierarchy of higher-order cognitive influences that shape color triad choices. We further discuss additional mitigating influences outside the scope of CIE modeling, which can be incorporated in this framework, including well-known influences from language, stimulus set effects, and color preference bias. We also discuss universal and cultural aspects of the model as well as non-uniformity of the color space with respect to different cultural biases.

Dataset Information

Comparing distributions of color words: pitfalls and metric choices.

Publications

Comparing distributions of color words: pitfalls and metric choices.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets