Sentiment annotations for 3827 simplified Chinese characters

Behav Res Methods. 2024 Feb;56(2):651-666. doi: 10.3758/s13428-023-02068-7. Epub 2023 Feb 8.

Abstract

Sentiment analysis in Chinese natural language processing has been largely based on words annotated with sentiment categories or scores. Characters, however, are the basic orthographic, phonological, and in most cases, semantic units in the Chinese language. This study collected sentiment annotations for 3827 characters. The ratings demonstrated high levels of reliability, and were validated through a comparison with the ratings of some characters' word equivalents reported in a previous norming study. Relations with other lexico-semantic variables and character processing efficiency were investigated. Furthermore, analyses of the association between constituent character valence and word valence revealed semantic compositionality and sentiment fusion characteristic of larger Chinese linguistic units. These ratings for characters, expanding current Chinese sentiment lexicons, can be utilized for the purposes of more precise stimuli assessment in research on Chinese character processing and more efficient sentiment analysis equipped with annotations of single-character words.

Keywords: Affect representation; Chinese characters; Sentiment compositionality; Valence.

MeSH terms

  • Attitude
  • Humans
  • Language*
  • Linguistics
  • Reading
  • Reproducibility of Results
  • Semantics*