When is best-worst best? A comparison of best-worst scaling, numeric estimation, and rating scales for collection of semantic norms

Behav Res Methods. 2018 Feb;50(1):115-133. doi: 10.3758/s13428-017-1009-0.

Abstract

Large-scale semantic norms have become both prevalent and influential in recent psycholinguistic research. However, little attention has been directed towards understanding the methodological best practices of such norm collection efforts. We compared the quality of semantic norms obtained through rating scales, numeric estimation, and a less commonly used judgment format called best-worst scaling. We found that best-worst scaling usually produces norms with higher predictive validities than other response formats, and does so requiring less data to be collected overall. We also found evidence that the various response formats may be producing qualitatively, rather than just quantitatively, different data. This raises the issue of potential response format bias, which has not been addressed by previous efforts to collect semantic norms, likely because of previous reliance on a single type of response format for a single type of semantic judgment. We have made available software for creating best-worst stimuli and scoring best-worst data. We also made available new norms for age of acquisition, valence, arousal, and concreteness collected using best-worst scaling. These norms include entries for 1,040 words, of which 1,034 are also contained in the ANEW norms (Bradley & Lang, Affective norms for English words (ANEW): Instruction manual and affective ratings (pp. 1-45). Technical report C-1, the center for research in psychophysiology, University of Florida, 1999).

Keywords: Best-worst scaling; Numeric estimation; Rating scales; Semantic judgment; Semantics.

MeSH terms

  • Arousal / physiology
  • Humans
  • Judgment / physiology*
  • Learning / physiology
  • Psycholinguistics / methods*
  • Relative Value Scales
  • Reproducibility of Results
  • Semantics*