Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models

Hendrik Vankrunkelsven; Steven Verheyen; Gert Storms; Simon De Deyne

doi:10.5334/joc.50

Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models

J Cogn. 2018 Nov 27;1(1):45. doi: 10.5334/joc.50.

Authors

Hendrik Vankrunkelsven¹, Steven Verheyen¹, Gert Storms¹, Simon De Deyne^{1

2}

Affiliations

¹ Laboratory of Experimental Psychology, KU Leuven, BE.
² Computational Cognitive Science Lab, University of Melbourne, AU.

PMID: 31517218
PMCID: PMC6634333
DOI: 10.5334/joc.50

Abstract

In two studies we compare a distributional semantic model derived from word co-occurrences and a word association based model in their ability to predict properties that affect lexical processing. We focus on age of acquisition, concreteness, and three affective variables, namely valence, arousal, and dominance, since all these variables have been shown to be fundamental in word meaning. In both studies we use a model based on data obtained in a continued free word association task to predict these variables. In Study 1 we directly compare this model to a word co-occurrence model based on syntactic dependency relations to see which model is better at predicting the variables under scrutiny in Dutch. In Study 2 we replicate our findings in English and compare our results to those reported in the literature. In both studies we find the word association-based model fit to predict diverse word properties. Especially in the case of predicting affective word properties, we show that the association model is superior to the distributional model.

Keywords: affective word characteristics; age of acquisition; concreteness; k-nearest neighbors; lexical norms; word associations.

Grants and funding

The reported work was sponsored by University of Leuven Research Council grant C14/16032 awarded to GS and by ARC grants DE140101749 and DP150103280 awarded to SDD. The publication was sponsored by the KU Leuven Fund for Fair Open Access. All four authors developed the study concept. HV performed the data analysis and drafted the manuscript. SV, GS, and SDD provided critical revisions. All authors approved the final version of the manuscript for submission.