When is best-worst best? A comparison of best-worst scaling, numeric estimation, and rating scales for collection of semantic norms

Geoff Hollis; Chris Westbury

doi:10.3758/s13428-017-1009-0

When is best-worst best? A comparison of best-worst scaling, numeric estimation, and rating scales for collection of semantic norms

Behav Res Methods. 2018 Feb;50(1):115-133. doi: 10.3758/s13428-017-1009-0.

Authors

Geoff Hollis¹, Chris Westbury²

Affiliations

¹ Department of Computing Science, University of Alberta, 3-57 Athabasca Hall, Edmonton, AB, T6G 2E8, Canada. hollis@ualberta.ca.
² Department of Psychology, University of Alberta, P217 Biological Sciences Building T6G 2E9, Edmonton, AB, Canada.

PMID: 29322399
DOI: 10.3758/s13428-017-1009-0

Abstract

Large-scale semantic norms have become both prevalent and influential in recent psycholinguistic research. However, little attention has been directed towards understanding the methodological best practices of such norm collection efforts. We compared the quality of semantic norms obtained through rating scales, numeric estimation, and a less commonly used judgment format called best-worst scaling. We found that best-worst scaling usually produces norms with higher predictive validities than other response formats, and does so requiring less data to be collected overall. We also found evidence that the various response formats may be producing qualitatively, rather than just quantitatively, different data. This raises the issue of potential response format bias, which has not been addressed by previous efforts to collect semantic norms, likely because of previous reliance on a single type of response format for a single type of semantic judgment. We have made available software for creating best-worst stimuli and scoring best-worst data. We also made available new norms for age of acquisition, valence, arousal, and concreteness collected using best-worst scaling. These norms include entries for 1,040 words, of which 1,034 are also contained in the ANEW norms (Bradley & Lang, Affective norms for English words (ANEW): Instruction manual and affective ratings (pp. 1-45). Technical report C-1, the center for research in psychophysiology, University of Florida, 1999).

Keywords: Best-worst scaling; Numeric estimation; Rating scales; Semantic judgment; Semantics.

MeSH terms

Arousal / physiology
Humans
Judgment / physiology*
Learning / physiology
Psycholinguistics / methods*
Relative Value Scales
Reproducibility of Results
Semantics*