The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap

Behav Res Methods. 2019 Feb;51(1):14-27. doi: 10.3758/s13428-018-1142-4.

Abstract

This article introduces the second version of the Tool for the Automatic Analysis of Cohesion (TAACO 2.0). Like its predecessor, TAACO 2.0 is a freely available text analysis tool that works on the Windows, Mac, and Linux operating systems; is housed on a user's hard drive; is easy to use; and allows for batch processing of text files. TAACO 2.0 includes all the original indices reported for TAACO 1.0, but it adds a number of new indices related to local and global cohesion at the semantic level, reported by latent semantic analysis, latent Dirichlet allocation, and word2vec. The tool also includes a source overlap feature, which calculates lexical and semantic overlap between a source and a response text (i.e., cohesion between the two texts based measures of text relatedness). In the first study in this article, we examined the effects that cohesion features, prompt, essay elaboration, and enhanced cohesion had on expert ratings of text coherence, finding that global semantic similarity as reported by word2vec was an important predictor of coherence ratings. A second study was conducted to examine the source and response indices. In this study we examined whether source overlap between the speaking samples found in the TOEFL-iBT integrated speaking tasks and the responses produced by test-takers was predictive of human ratings of speaking proficiency. The results indicated that the percentage of keywords found in both the source and response and the similarity between the source document and the response, as reported by word2vec, were significant predictors of speaking quality. Combined, these findings help validate the new indices reported for TAACO 2.0.

Keywords: Coherence; Cohesion; Essay quality; Natural language processing; Speaking proficiency.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Humans
  • Linguistics*
  • Natural Language Processing
  • Semantics*
  • Software*