Supplement and Suppression: Both Boundary and Nonboundary Are Helpful for Salient Object Detection

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6615-6627. doi: 10.1109/TNNLS.2021.3127959. Epub 2023 Sep 1.

Abstract

Current methods aggregate multilevel features from the backbone and introduce edge information to get more refined saliency maps. However, little attention is paid to how to suppress the regions with similar saliency appearances in the background. These regions usually exist in the vicinity of salient objects and have high contrast with the background, which is easy to be misclassified as foreground. To solve this problem, we propose a gated feature interaction network (GFINet) to integrate multiple saliency features, which can utilize nonboundary features with background information to suppress pseudosalient objects and simultaneously apply boundary features to supplement edge details. Different from previous methods that only consider the complementarity between saliency and boundary, the proposed network introduces nonboundary features into the decoder to filter the pseudosalient objects. Specifically, GFINet consists of global features aggregation branch (GFAB), boundary and nonboundary features' perception branch (B&NFPB), and gated feature interaction module (GFIM). According to the global features generated by GFAB, boundary and nonboundary features produced by B&NFPB and GFIM employ a gate structure to adaptively optimize the saliency information interchange between abovementioned features and, thus, predict the final saliency maps. Besides, due to the imbalanced distribution between the boundary pixels and nonboundary ones, the binary cross-entropy (BCE) loss is difficult to predict the pixels near the boundary. Therefore, we design a border region aware (BRA) loss to further boost the quality of boundary and nonboundary, which can guide the network to focus more on the hard pixels near the boundary by assigning different weights to different positions. Compared with 12 counterparts, experimental results on five benchmark datasets show that our method has better generalization and improves the state-of-the-art approach by 4.85% averagely in terms of the regional and boundary evaluation measures. In addition, our model is more efficient with an inference speed of 50.3 FPS when processing a 320 ×320 image. Code has been made available at https://github.com/lesonly/GFINet.