Embedding Attention and Residual Network for Accurate Salient Object Detection

IEEE Trans Cybern. 2020 May;50(5):2050-2062. doi: 10.1109/TCYB.2018.2879859. Epub 2018 Nov 27.

Abstract

Salient object detection is usually used as a preprocessing step to facilitate a variety of subsequent applications which should take little time cost. With the quick development of deep learning recently, profound progresses have been made to achieve a new state-of-the-art performance. However, the learned features of the existing deep learning-based methods are not accurate enough thus leading to unsatisfactory detection in complex scenes, such as low contrast or very similar between salient object and background region and multiple (small) salient objects with diverse characteristics. In addition, some post-processing techniques are usually needed for refinement, which is time consuming. To address these issues, this paper presents an efficient fully convolutional salient object detection network. Specifically, we first introduce a visual attention mechanism to guide feature learning in side output layers. In detail, attention weight is employed in a top-down manner which can bridge high level semantic information to help shallow layers better locate salient objects and also filter out noisy response in the background region. Second, we propose a residual refinement network to fuse the learned multilevel features gradually. Not to simply add or concatenate them step by step as previous works, we introduce a second-order term into element-wise addition to learn stage-wise residual features for refinement. Such a second-order term not only benefits efficient gradient propagation but also increases network nonlinearity. Extensive experiments on seven standard benchmarks demonstrate that the proposed approach achieves consistently superior performance and performs well on small salient object detection in comparison with the very recent state-of-the-arts, especially in the metric of structure-measure.