Bag-Level Aggregation for Multiple-Instance Active Learning in Instance Classification Problems

Marc-Andre Carbonneau; Eric Granger; Ghyslain Gagnon

doi:10.1109/TNNLS.2018.2869164

Bag-Level Aggregation for Multiple-Instance Active Learning in Instance Classification Problems

IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1441-1451. doi: 10.1109/TNNLS.2018.2869164. Epub 2018 Oct 1.

Authors

Marc-Andre Carbonneau, Eric Granger, Ghyslain Gagnon

PMID: 30281492
DOI: 10.1109/TNNLS.2018.2869164

Abstract

A growing number of applications, e.g., video surveillance and medical image analysis, require training recognition systems from large amounts of weakly annotated data, while some targeted interactions with a domain expert are allowed to improve the training process. In such cases, active learning (AL) can reduce labeling costs for training a classifier by querying the expert to provide the labels of most informative instances. This paper focuses on AL methods for instance classification problems in multiple instance learning (MIL), where data are arranged into sets, called bags, which are weakly labeled. Most AL methods focus on single-instance learning problems. These methods are not suitable for MIL problems because they cannot account for the bag structure of data. In this paper, new methods for bag-level aggregation of instance informativeness are proposed for multiple instance AL (MIAL). The aggregated informativeness method identifies the most informative instances based on classifier uncertainty and queries bags incorporating the most information. The other proposed method, called cluster-based aggregative sampling, clusters data hierarchically in the instance space. The informativeness of instances is assessed by considering bag labels, inferred instance labels, and the proportion of labels that remain to be discovered in clusters. Both proposed methods significantly outperform reference methods in extensive experiments using benchmark data from several application domains. Results indicate that using an appropriate strategy to address MIAL problems yields a significant reduction in the number of queries needed to achieve the same level of performance as single-instance AL methods.