Go Deep or Broad? Exploit Hybrid Network Architecture for Weakly Supervised Object Classification and Localization

IEEE Trans Neural Netw Learn Syst. 2023 Jan 3:PP. doi: 10.1109/TNNLS.2022.3225180. Online ahead of print.

Abstract

Weakly supervised object classification and localization are learned object classes and locations using only image-level labels, as opposed to bounding box annotations. Conventional deep convolutional neural network (CNN)-based methods activate the most discriminate part of an object in feature maps and then attempt to expand feature activation to the whole object, which leads to deteriorating the classification performance. In addition, those methods only use the most semantic information in the last feature map, while ignoring the role of shallow features. So, it remains a challenge to enhance classification and localization performance with a single frame. In this article, we propose a novel hybrid network, namely deep and broad hybrid network (DB-HybridNet), which combines deep CNNs with a broad learning network to learn discriminative and complementary features from different layers, and then integrates multilevel features (i.e., high-level semantic features and low-level edge features) in a global feature augmentation module. Importantly, we exploit different combinations of deep features and broad learning layers in DB-HybridNet and design an iterative training algorithm based on gradient descent to ensure the hybrid network work in an end-to-end framework. Through extensive experiments on caltech-UCSD birds (CUB)-200 and imagenet large scale visual recognition challenge (ILSVRC) 2016 datasets, we achieve state-of-the-art classification and localization performance.