Foreground Capture Feature Pyramid Network-Oriented Object Detection in Complex Backgrounds

IEEE Trans Neural Netw Learn Syst. 2024 Apr 22:PP. doi: 10.1109/TNNLS.2024.3387282. Online ahead of print.

Abstract

Feature pyramids are widely adopted in visual detection models for capturing multiscale features of objects. However, the utilization of feature pyramids in practical object detection tasks is prone to complex background interference, resulting in suboptimal capture of discriminative multiscale foreground semantic features. In this article, a foreground capture feature pyramid network (FCFPN) for multiscale object detection is proposed, to address the problem of inadequate feature learning in complex backgrounds. FCFPN consists of a foreground dual attention (FDA) module and a pathway aggregation (PA) structure. Specifically, the FDA mechanism activates top-down foreground channel responses and lateral spatial foreground location features, so that channel and spatial foreground features are adequately captured. Then, the PA module adaptively learns the fusion weights of multiscale features at different levels of the feature pyramid, which enhances the complementarity of semantic information between different levels of the foreground feature maps. Since the fusion weights are learned adaptively based on different pyramid levels, the detection model accordingly retains the gained information of feature sizes and suppresses the conflicting information. The evaluations on public datasets and the self-built complex background dataset demonstrate that the detection average precision (AP) and the feature learning performance of the proposed method are superior compared with other FPNs, which proves the effectiveness of the proposed FCFPN.