Creation of reliable relevance judgments in information retrieval systems evaluation experimentation through crowdsourcing: a review

ScientificWorldJournal. 2014:2014:135641. doi: 10.1155/2014/135641. Epub 2014 May 19.

Abstract

Test collection is used to evaluate the information retrieval systems in laboratory-based evaluation experimentation. In a classic setting, generating relevance judgments involves human assessors and is a costly and time consuming task. Researchers and practitioners are still being challenged in performing reliable and low-cost evaluation of retrieval systems. Crowdsourcing as a novel method of data acquisition is broadly used in many research fields. It has been proven that crowdsourcing is an inexpensive and quick solution as well as a reliable alternative for creating relevance judgments. One of the crowdsourcing applications in IR is to judge relevancy of query document pair. In order to have a successful crowdsourcing experiment, the relevance judgment tasks should be designed precisely to emphasize quality control. This paper is intended to explore different factors that have an influence on the accuracy of relevance judgments accomplished by workers and how to intensify the reliability of judgments in crowdsourcing experiment.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Consumer Behavior*
  • Crowdsourcing*
  • Decision Support Techniques*
  • Information Storage and Retrieval / methods*
  • Pattern Recognition, Automated / methods*
  • Software Validation*