Active Crowdsourcing for Multilabel Annotation

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):3549-3559. doi: 10.1109/TNNLS.2022.3194022. Epub 2024 Feb 29.

Abstract

Multilabel annotation is a critical step to generate training sets when learning classification models in various application domains, but asking domain experts to provide labels is usually time-consuming and expensive, which cannot meet the current requirement of the fast evolution of the models in the big data era. Although crowdsourcing provides a fast solution to acquire labels for multilabel learning, it faces the risk of high data acquisition cost and low label quality. This article proposes a novel one-coin label-dependent active crowdsourcing (OCLDAC) method to iteratively query noisy labels from crowd workers and learn multilabel classification models. In each iteration of active learning, integrated labels of instances are first inferred by a novel one-coin label-dependent model, which utilizes a mixture of multiple independent Bernoulli distributions to explore and exploit correlations among the labels to increase the accuracy of truth inference. Then, instances, labels, and workers are selected according to the novel strategies that incorporate the distribution of noisy labels, the prediction probability of learning models, label correlations, and the reliability of crowd workers. Simulations on eight multilabel datasets and evaluation on one real-world crowdsourcing dataset consistently show that the proposed OCLDAC significantly outperforms the state-of-the-art methods and their variants.