Fusing Self-Attention and CoordConv to Improve the YOLOv5s Algorithm for Infrared Weak Target Detection

Sensors (Basel). 2023 Jul 28;23(15):6755. doi: 10.3390/s23156755.

Abstract

Convolutional neural networks have achieved good results in target detection in many application scenarios, but convolutional neural networks still face great challenges when facing scenarios with small target sizes and complex background environments. To solve the problem of low accuracy of infrared weak target detection in complex scenes, and considering the real-time requirements of the detection task, we choose the YOLOv5s target detection algorithm for improvement. We add the Bottleneck Transformer structure and CoordConv to the network to optimize the model parameters and improve the performance of the detection network. Meanwhile, a two-dimensional Gaussian distribution is used to describe the importance of pixel points in the target frame, and the normalized Guassian Wasserstein distance (NWD) is used to measure the similarity between the prediction frame and the true frame to characterize the loss function of weak targets, which will help highlight the targets with flat positional deviation transformation and improve the detection accuracy. Finally, through experimental verification, compared with other mainstream detection algorithms, the improved algorithm in this paper significantly improves the target detection accuracy, with the mAP reaching 96.7 percent, which is 2.2 percentage points higher compared with Yolov5s.

Keywords: CoordConv; NWD; YOLOv5s; multi-head self-attention; target detection.