An Improved YOLOv5-Based Underwater Object-Detection Framework

Sensors (Basel). 2023 Apr 3;23(7):3693. doi: 10.3390/s23073693.

Abstract

To date, general-purpose object-detection methods have achieved a great deal. However, challenges such as degraded image quality, complex backgrounds, and the detection of marine organisms at different scales arise when identifying underwater organisms. To solve such problems and further improve the accuracy of relevant models, this study proposes a marine biological object-detection architecture based on an improved YOLOv5 framework. First, the backbone framework of Real-Time Models for object Detection (RTMDet) is introduced. The core module, Cross-Stage Partial Layer (CSPLayer), includes a large convolution kernel, which allows the detection network to precisely capture contextual information more comprehensively. Furthermore, a common convolution layer is added to the stem layer, to extract more valuable information from the images efficiently. Then, the BoT3 module with the multi-head self-attention (MHSA) mechanism is added into the neck module of YOLOv5, such that the detection network has a better effect in scenes with dense targets and the detection accuracy is further improved. The introduction of the BoT3 module represents a key innovation of this paper. Finally, union dataset augmentation (UDA) is performed on the training set using the Minimal Color Loss and Locally Adaptive Contrast Enhancement (MLLE) image augmentation method, and the result is used as the input to the improved YOLOv5 framework. Experiments on the underwater datasets URPC2019 and URPC2020 show that the proposed framework not only alleviates the interference of underwater image degradation, but also makes the mAP@0.5 reach 79.8% and 79.4% and improves the mAP@0.5 by 3.8% and 1.1%, respectively, when compared with the original YOLOv8 on URPC2019 and URPC2020, demonstrating that the proposed framework presents superior performance for the high-precision detection of marine organisms.

Keywords: CSPNeXt block; YOLOv5; bottleneck transformer; object detection.