From Discriminant to Complete: Reinforcement Searching-Agent Learning for Weakly Supervised Object Detection

Dingwen Zhang; Junwei Han; Long Zhao; Tao Zhao

doi:10.1109/TNNLS.2020.2969483

From Discriminant to Complete: Reinforcement Searching-Agent Learning for Weakly Supervised Object Detection

IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5549-5560. doi: 10.1109/TNNLS.2020.2969483. Epub 2020 Nov 30.

Authors

Dingwen Zhang, Junwei Han, Long Zhao, Tao Zhao

PMID: 32092016
DOI: 10.1109/TNNLS.2020.2969483

Abstract

Weakly supervised object detection (WSOD) is an interesting yet challenging task in the computer vision community. The core is to discover the image regions that contain the complete object instances under the image-level supervision. Existing works usually solve this problem via a proposal selection strategy, which selects the most discriminative box regions from the weakly labeled training images. However, these regions usually only contain the discriminative object parts rather than the complete object instances. To address this problem, this article proposes to learn a searching-agent to gradually mine desirable object regions under a region searching paradigm, where we formulate the searching process as a Markov decision process and learn the searching-agent under a deep reinforcement learning framework. To learn such a searching-agent under the weak supervision, we extract the pseudo-complete object regions and the corresponding local discriminative object parts and introduce the obtained pseudo-target-part training pairs into the reinforcement learning process of the search-agent. This learning strategy has twofold advantages: 1) it can mimic the searching process to reveal complete object regions from a certain discriminative part of the object under the weak supervision and 2) it will not suffer from the learning difficulty arise from the long-action sequence that happens when searching from the entire image range. Comprehensive experiments on benchmark data sets demonstrate that by integrating the learned searching-agent with the existing WSOD method, we can achieve better performance than the other state-of-the-art and baseline methods.