Frequency effects in linear discriminative learning

Maria Heitmeier; Yu-Ying Chuang; Seth D Axen; R Harald Baayen

doi:10.3389/fnhum.2023.1242720

Frequency effects in linear discriminative learning

Front Hum Neurosci. 2024 Jan 8:17:1242720. doi: 10.3389/fnhum.2023.1242720. eCollection 2023.

Authors

Maria Heitmeier^{1

2}, Yu-Ying Chuang¹, Seth D Axen², R Harald Baayen^{1

2}

Affiliations

¹ Quantitative Linguistics, University of Tübingen, Tübingen, Germany.
² Cluster of Excellence Machine Learning: New Perspectives for Science, University of Tübingen, Tübingen, Germany.

Abstract

Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM) models lexical processing with mappings between words' forms and their meanings. Comprehension and production are modeled via linear mappings between the two domains. So far, the mappings within the model can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic solution modeling the theoretical endstate of learning (EL) where all words are learned optimally. In the present study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project by means of a Gaussian Location Scale Model and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese, compared to EL. Finally, we used ordered data from CHILDES to compare mappings obtained with FIL and incremental learning. We show that the mappings are highly correlated, but that with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.

Keywords: distributional semantics; incremental learning; lexical decision; linear discriminative learning; mental lexicon; weighted regression; word frequency.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy—EXC number 2064/1—Project Number 390727645 and by the European Research Council, project WIDE-742545.