Zipf's laws of meaning in Catalan

PLoS One. 2021 Dec 16;16(12):e0260849. doi: 10.1371/journal.pone.0260849. eCollection 2021.

Abstract

In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan. We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Communication*
  • Data Mining
  • Humans
  • Language*
  • Linguistics / methods*
  • Models, Theoretical*
  • Semantics*
  • Spain
  • Speech*

Grants and funding

PRO2020-S03 (RCO03080 Lingüística Quantitativa) from Institut d’Estudis Catalans. (https://www.iec.cat/) PRO2021-S03 HERNANDEZ from Institut d’Estudis Catalans. (https://www.iec.cat/) JB, RFC and AHF are funded by the grant TIN2017-89244-R from Ministerio de Economia, Industria y Competitividad (Gobierno de España) (https://www.cnio.es/) JB, RFC and AHF are supported by the recognition 2017SGR-856 (MACDA) from AGAUR (Generalitat de Catalunya). (https://agaur.gencat.cat/ca/inici) The Institut d’Estudis Catalans (https://www.iec.cat/) provided the following datasets: (1) the normative dictionary of the Catalan language (DIEC2), and (2) the written corpus CTILC.