Automatic Coding of Short Text Responses via Clustering in Educational Assessment

Educ Psychol Meas. 2016 Apr;76(2):280-303. doi: 10.1177/0013164415590022. Epub 2015 Jun 8.

Abstract

Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with [Formula: see text] responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings [Formula: see text] Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with [Formula: see text] and somewhat poorer but still acceptable down to [Formula: see text] Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses.

Keywords: automatic coding; automatic short-answer grading; computer-automated scoring; computer-based assessment.