Predicting raters' transparency judgments of English and Chinese morphological constituents using latent semantic analysis

Hsueh-Cheng Wang; Li-Chuan Hsu; Yi-Min Tien; Marc Pomplun

doi:10.3758/s13428-013-0360-z

Predicting raters' transparency judgments of English and Chinese morphological constituents using latent semantic analysis

Behav Res Methods. 2014 Mar;46(1):284-306. doi: 10.3758/s13428-013-0360-z.

Authors

Hsueh-Cheng Wang¹, Li-Chuan Hsu, Yi-Min Tien, Marc Pomplun

Affiliation

¹ Department of Computer Science, University of Massachusetts Boston, Boston, MA, USA, hchengwang@gmail.com.

Abstract

The morphological constituents of English compounds (e.g., "butter" and "fly" for "butterfly") and two-character Chinese compounds may differ in meaning from the whole word. Subjective differences and ambiguity of transparency make judgments difficult, and a computational alternative based on a general model might be a way to average across subjective differences. In the present study, we propose two approaches based on latent semantic analysis (Landauer & Dumais in Psychological Review 104:211-240, 1997): Model 1 compares the semantic similarity between a compound word and each of its constituents, and Model 2 derives the dominant meaning of a constituent from a clustering analysis of morphological family members (e.g., "butterfingers" or "buttermilk" for "butter"). The proposed models successfully predicted participants' transparency ratings, and we recommend that experimenters use Model 1 for English compounds and Model 2 for Chinese compounds, on the basis of differences in raters' morphological processing in the different writing systems. The dominance of lexical meaning, semantic transparency, and the average similarity between all pairs within a morphological family are provided, and practical applications for future studies are discussed.

Publication types

Comparative Study
Research Support, N.I.H., Extramural

MeSH terms

Adult
Area Under Curve
Asian People
Female
Humans
Judgment*
Language*
Models, Psychological*
Models, Statistical*
Predictive Value of Tests
Psycholinguistics / methods*
Psycholinguistics / statistics & numerical data
ROC Curve
Semantics*
Vocabulary

Grants and funding

R01 EY021802/EY/NEI NIH HHS/United States