New algorithms assessing short summaries in expository texts using latent semantic analysis

Behav Res Methods. 2009 Aug;41(3):944-50. doi: 10.3758/BRM.41.3.944.

Abstract

In this study, we compared four expert graders with latent semantic analysis (LSA) to assess short summaries of an expository text. As is well known, there are technical difficulties for LSA to establish a good semantic representation when analyzing short texts. In order to improve the reliability of LSA relative to human graders, we analyzed three new algorithms by two holistic methods used in previous research (León, Olmos, Escudero, Cañas, & Salmerón, 2006). The three new algorithms were (1) the semantic common network algorithm, an adaptation of an algorithm proposed by W. Kintsch (2001, 2002) with respect to LSA as a dynamic model of semantic representation; (2) a best-dimension reduction measure of the latent semantic space, selecting those dimensions that best contribute to improving the LSA assessment of summaries (Hu, Cai, Wiemer-Hastings, Graesser, & McNamara, 2007); and (3) the Euclidean distance measure, used by Rehder et al. (1998), which incorporates at the same time vector length and the cosine measures. A total of 192 Spanish middle-grade students and 6 experts took part in this study. They read an expository text and produced a short summary. Results showed significantly higher reliability of LSA as a computerized assessment tool for expository text when it used a best-dimension algorithm rather than a standard LSA algorithm. The semantic common network algorithm also showed promising results.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Algorithms*
  • Behavioral Research / methods*
  • Comprehension
  • Humans
  • Models, Statistical
  • Reading
  • Semantics*