QAScore-An Unsupervised Unreferenced Metric for the Question Generation Evaluation

Tianbo Ji; Chenyang Lyu; Gareth Jones; Liting Zhou; Yvette Graham

doi:10.3390/e24111514

QAScore-An Unsupervised Unreferenced Metric for the Question Generation Evaluation

Entropy (Basel). 2022 Oct 24;24(11):1514. doi: 10.3390/e24111514.

Authors

Tianbo Ji¹, Chenyang Lyu², Gareth Jones¹, Liting Zhou¹, Yvette Graham³

Affiliations

¹ ADAPT Centre, School of Computing, Dublin City University, 9 Dublin, Ireland.
² SFI Centre for Research Training in Machine Learning, School of Computing, Dublin City University, 9 Dublin, Ireland.
³ ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin, 2 Dublin, Ireland.

Abstract

Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, current QG evaluation metrics solely rely on the comparison between the generated questions and references, ignoring the passages or answers. Meanwhile, these metrics are generally criticized because of their low agreement with human judgement. We therefore propose a new reference-free evaluation metric called QAScore, which is capable of providing a better mechanism for evaluating QG systems. QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Compared to existing metrics such as BLEU and BERTScore, QAScore can obtain a stronger correlation with human judgement according to our human evaluation experiment, meaning that applying QAScore in the QG task benefits to a higher level of evaluation accuracy.

Keywords: question generation; question generation evaluation; reference-free evaluation.

Grants and funding

Grants 13/RC/2106_P2; 13/RC/2106)/the SFI Research Centres Programme