LRAF-Net: Long-Range Attention Fusion Network for Visible-Infrared Object Detection

IEEE Trans Neural Netw Learn Syst. 2023 Jun 6:PP. doi: 10.1109/TNNLS.2023.3266452. Online ahead of print.

Abstract

Visible-infrared object detection aims to improve the detector performance by fusing the complementarity of visible and infrared images. However, most existing methods only use local intramodality information to enhance the feature representation while ignoring the efficient latent interaction of long-range dependence between different modalities, which leads to unsatisfactory detection performance under complex scenes. To solve these problems, we propose a feature-enhanced long-range attention fusion network (LRAF-Net), which improves detection performance by fusing the long-range dependence of the enhanced visible and infrared features. First, a two-stream CSPDarknet53 network is used to extract the deep features from visible and infrared images, in which a novel data augmentation (DA) method is designed to reduce the bias toward a single modality through asymmetric complementary masks. Then, we propose a cross-feature enhancement (CFE) module to improve the intramodality feature representation by exploiting the discrepancy between visible and infrared images. Next, we propose a long-range dependence fusion (LDF) module to fuse the enhanced features by associating the positional encoding of multimodality features. Finally, the fused features are fed into a detection head to obtain the final detection results. Experiments on several public datasets, i.e., VEDAI, FLIR, and LLVIP, show that the proposed method obtains state-of-the-art performance compared with other methods.