Grammatical Gender Disambiguates Syntactically Similar Nouns

Entropy (Basel). 2022 Apr 7;24(4):520. doi: 10.3390/e24040520.

Abstract

Recent research into grammatical gender from the perspective of information theory has shown how seemingly arbitrary gender systems can ease processing demands by guiding lexical prediction. When the gender of a noun is revealed in a preceding element, the list of possible candidates is reduced to the nouns assigned to that gender. This strategy can be particularly effective if it eliminates words that are likely to compete for activation against the intended word. We propose syntax as the crucial context within which words must be disambiguated, hypothesizing that syntactically similar words should be less likely to share a gender cross-linguistically. We draw on recent work on syntactic information in the lexicon to define the syntactic distribution of a word as a probability vector of its participation in various dependency relations, and we extract such relations for 32 languages from the Universal Dependencies Treebanks. Correlational and mixed-effects regression analyses reveal that syntactically similar nouns are less likely to share a gender, the opposite pattern that is found for semantically and orthographically similar words. We interpret this finding as a design feature of language, and this study adds to a growing body of research attesting to the ways in which functional pressures on learning, memory, production, and perception shape the lexicon in different ways.

Keywords: corpus linguistics; grammatical gender; information theory; lexicon; syntax; usage-based.