Comparing distributions of color words: pitfalls and metric choices

PLoS One. 2014 Feb 25;9(2):e89184. doi: 10.1371/journal.pone.0089184. eCollection 2014.

Abstract

Computational methods have started playing a significant role in semantic analysis. One particularly accessible area for developing good computational methods for linguistic semantics is in color naming, where perceptual dissimilarity measures provide a geometric setting for the analyses. This setting has been studied first by Berlin & Kay in 1969, and then later on by a large data collection effort: the World Color Survey (WCS). From the WCS, a dataset on color naming by 2 616 speakers of 110 different languages is made available for further research. In the analysis of color naming from WCS, however, the choice of analysis method is an important factor of the analysis. We demonstrate concrete problems with the choice of metrics made in recent analyses of WCS data, and offer approaches for dealing with the problems we can identify. Picking a metric for the space of color naming distributions that ignores perceptual distances between colors assumes a decorrelated system, where strong spatial correlations in fact exist. We can demonstrate that the corresponding issues are significantly improved when using Earth Mover's Distance, or Quadratic [Formula: see text]-square Distance, and we can approximate these solutions with a kernel-based analysis method.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Choice Behavior / physiology*
  • Color
  • Color Perception / physiology*
  • Humans
  • Language
  • Linguistics / methods
  • Semantics

Grants and funding

MVJ was mostly supported by the 7th Framework Programme through the project Toposys (FP7-ICT-318493-STREP, http://www.toposys.eu) and partially by Knut och Alice Wallenbergs Stiftelse: Unga Forskare (http://www.wallenberg.com/kaw). SV was supported through Stockholm University and the FoSprak graduate school. CHE was supported through the project Tomsy (IST-FP7-270436, http://www.tomsy.eu). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.