Feature Pyramid Reconfiguration with Consistent Loss for Object Detection

Fuchun Sun; Tao Kong; Wenbing Huang; Chuanqi Tan; Bin Fang; Huaping Liu

doi:10.1109/TIP.2019.2917781

Feature Pyramid Reconfiguration with Consistent Loss for Object Detection

IEEE Trans Image Process. 2019 May 24. doi: 10.1109/TIP.2019.2917781. Online ahead of print.

Authors

Fuchun Sun, Tao Kong, Wenbing Huang, Chuanqi Tan, Bin Fang, Huaping Liu

PMID: 31144633
DOI: 10.1109/TIP.2019.2917781

Abstract

Taking the feature pyramids into account has become a crucial way to boost the object detection performance. While various pyramid representations have been developed, previous works are still inefficient to integrate the semantical information over different scales. Moreover, recent object detectors are suffering from accurate object location applications, mainly due to the coarse definition of the "positive" examples at training and predicting phases. In this paper, we begin by analyzing current pyramid solutions, and then propose a novel architecture by reconfiguring the feature hierarchy in a flexible yet effective way. In particular, our architecture consists of two lightweight and trainable processes: global attention and local reconfiguration. The global attention is to emphasize the global information of each feature scale, while the local reconfiguration is to capture the local correlations across different scales. Both the global attention and local reconfiguration are non-linear and thus exhibit more expressive ability. Then, we discover that the loss function for object detectors during training is the central cause of the inaccurate location problem. We propose to address this issue by reshaping the standard cross entropy loss such that it focuses more on accurate predictions. Both the feature reconfiguration and the consistent loss could be utilized in popular one-stage (SSD, RetinaNet) and two-stage (Faster R-CNN) detection frameworks. Extensive experimental evaluations on PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO datasets demonstrate that, our models achieve consistent and significant boosts compared with other state-of-the-art methods.