Domain-Weighted Majority Voting for Crowdsourcing

IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):163-174. doi: 10.1109/TNNLS.2018.2836969. Epub 2018 Jun 5.

Abstract

Crowdsourcing labeling systems provide an efficient way to generate multiple inaccurate labels for given observations. If the competence level or the "reputation," which can be explained as the probabilities of annotating the right label, for each crowdsourcing annotators is equal and biased to annotate the right label, majority voting (MV) is the optimal decision rule for merging the multiple labels into a single reliable one. However, in practice, the competence levels of annotators employed by the crowdsourcing labeling systems are often diverse very much. In these cases, weighted MV is more preferred. The weights should be determined by the competence levels. However, since the annotators are anonymous and the ground-truth labels are usually unknown, it is hard to compute the competence levels of the annotators directly. In this paper, we propose to learn the weights for weighted MV by exploiting the expertise of annotators. Specifically, we model the domain knowledge of different annotators with different distributions and treat the crowdsourcing problem as a domain adaptation problem. The annotators provide labels to the source domains and the target domain is assumed to be associated with the ground-truth labels. The weights are obtained by matching the source domains with the target domain. Although the target-domain labels are unknown, we prove that they could be estimated under mild conditions. Both theoretical and empirical analyses verify the effectiveness of the proposed method. Large performance gains are shown for specific data sets.

Publication types

  • Research Support, Non-U.S. Gov't