An experimental characterization of workers' behavior and accuracy in crowdsourced tasks

Evgenia Christoforou; Antonio Fernández Anta; Angel Sánchez

doi:10.1371/journal.pone.0252604

An experimental characterization of workers' behavior and accuracy in crowdsourced tasks

PLoS One. 2021 Jun 16;16(6):e0252604. doi: 10.1371/journal.pone.0252604. eCollection 2021.

Authors

Evgenia Christoforou¹, Antonio Fernández Anta², Angel Sánchez^{3

4

5}

Affiliations

¹ Transparency in Algorithms Group, CYENS - Centre of Excellence, Nicosia, Cyprus.
² IMDEA Networks Institute, Leganés (Madrid), Spain.
³ Grupo Interdisciplinar de Sistemas Complejos (GISC), Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés (Madrid), Spain.
⁴ Institute UC3M-BS of Financial Big Data (IBiDat), Universidad Carlos III de Madrid, Getafe (Madrid), Spain.
⁵ Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), Universidad de Zaragoza, Zaragoza, Spain.

Abstract

Crowdsourcing systems are evolving into a powerful tool of choice to deal with repetitive or lengthy human-based tasks. Prominent among those is Amazon Mechanical Turk, in which Human Intelligence Tasks, are posted by requesters, and afterwards selected and executed by subscribed (human) workers in the platform. Many times these HITs serve for research purposes. In this context, a very important question is how reliable the results obtained through these platforms are, in view of the limited control a requester has on the workers' actions. Various control techniques are currently proposed but they are not free from shortcomings, and their use must be accompanied by a deeper understanding of the workers' behavior. In this work, we attempt to interpret the workers' behavior and reliability level in the absence of control techniques. To do so, we perform a series of experiments with 600 distinct MTurk workers, specifically designed to elicit the worker's level of dedication to a task, according to the task's nature and difficulty. We show that the time required by a worker to carry out a task correlates with its difficulty, and also with the quality of the outcome. We find that there are different types of workers. While some of them are willing to invest a significant amount of time to arrive at the correct answer, at the same time we observe a significant fraction of workers that reply with a wrong answer. For the latter, the difficulty of the task and the very short time they took to reply suggest that they, intentionally, did not even attempt to solve the task.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Crowdsourcing / methods*
Humans
Reproducibility of Results
Task Performance and Analysis

Grants and funding

AS was supported in part by grants PGC2018-098186-B-I00 (BASIC, FEDER/MICINN- AEI, https://www.ciencia.gob.es/portal/site/MICINN/aei), PRACTICO-CM (Comunidad de Madrid, https://www.comunidad.madrid/servicios/educacion/convocatorias-ayudas-investigacion), and CAVTIONS-CM-UC3M (Comunidad de Madrid/Universidad Carlos III de Madrid, https://www.comunidad.madrid/servicios/educacion/convocatorias-ayudas-investigacion). AFA was supported by the Regional Government of Madrid (CM) grant 347 EdgeData-CM (P2018/TCS4499) cofounded by FSE & FEDER (https://www.comunidad.madrid/servicios/educacion/convocatorias-ayudas-investigacion), NSF of China grant 61520106005 (http://www.nsfc.gov.cn/english/site_1/index.html) and the Ministry of Science and Innovation (https://www.ciencia.gob.es/portal/site/MICINN/aei) grant PID2019-109805RB-I00 (ECID) cofounded by FEDER. The funders has no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.