A Computational Approach to Identifying Cultural Keywords Across Languages

Cogn Sci. 2024 Jan;48(1):e13402. doi: 10.1111/cogs.13402.

Abstract

Distinctive aspects of a culture are often reflected in the meaning and usage of words in the language spoken by bearers of that culture. Keywords such as душа (soul) in Russian, hati (heart) in Indonesian and Malay, and gezellig (convivial/cosy/fun) in Dutch are held to be especially culturally revealing, and scholars have identified a number of such keywords using careful linguistic analyses (Peeters, 2020b; Wierzbicka, 1990). Because keywords are expected to have different statistical properties than related words in other languages, we argue that a quantitative comparison of word usage across languages can help to identify cultural keywords. To support this claim, we describe a computational method that compares word frequencies across languages, and apply it to both linguistic corpora and word association data. The method identifies culturally specific words that range from "obvious" examples, such as Amsterdam in Dutch, to non-obvious yet independently proposed examples, such as hati (heart) in Indonesian. We show in addition that linguistic corpora and word association data provide converging evidence about culturally specific words. Our results therefore show how computational analyses and behavioral experiments can supplement the methods previously used by linguists to identify culturally salient words across languages.

Keywords: Cross-linguistic; Lexicon; Semantics; Word association.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Language*
  • Linguistics*