PDT-YOLO: A Roadside Object-Detection Algorithm for Multiscale and Occluded Targets

Ruoying Liu; Miaohua Huang; Liangzi Wang; Chengcheng Bi; Ye Tao

doi:10.3390/s24072302

PDT-YOLO: A Roadside Object-Detection Algorithm for Multiscale and Occluded Targets

Sensors (Basel). 2024 Apr 4;24(7):2302. doi: 10.3390/s24072302.

Authors

Ruoying Liu^{1

2

3}, Miaohua Huang^{1

2

3}, Liangzi Wang^{1

2

3}, Chengcheng Bi^{1

2

3}, Ye Tao^{1

2

3}

Affiliations

¹ Hubei Key Laboratory of Advanced Technology for Automotive Components, Wuhan University of Technology, Wuhan 430070, China.
² Hubei Collaborative Innovation Center for Automotive Components Technology, Wuhan University of Technology, Wuhan 430070, China.
³ Hubei Research Center for New Energy & Intelligent Connected Vehicle, Wuhan 430070, China.

Abstract

To tackle the challenges of weak sensing capacity for multi-scale objects, high missed detection rates for occluded targets, and difficulties for model deployment in detection tasks of intelligent roadside perception systems, the PDT-YOLO algorithm based on YOLOv7-tiny is proposed. Firstly, we introduce the intra-scale feature interaction module (AIFI) and reconstruct the feature pyramid structure to enhance the detection accuracy of multi-scale targets. Secondly, a lightweight convolution module (GSConv) is introduced to construct a multi-scale efficient layer aggregation network module (ETG), enhancing the network feature extraction ability while maintaining weight. Thirdly, multi-attention mechanisms are integrated to optimize the feature expression ability of occluded targets in complex scenarios, Finally, Wise-IoU with a dynamic non-monotonic focusing mechanism improves the accuracy and generalization ability of model sensing. Compared with YOLOv7-tiny, PDT-YOLO on the DAIR-V2X-C dataset improves mAP50 and mAP50:95 by 4.6% and 12.8%, with a parameter count of 6.1 million; on the IVODC dataset by 15.7% and 11.1%. We deployed the PDT-YOLO in an actual traffic environment based on a robot operating system (ROS), with a detection frame rate of 90 FPS, which can meet the needs of roadside object detection and edge deployment in complex traffic scenes.

Keywords: YOLOv7-tiny; intra-scale feature interaction module; multi-attention mechanism; multi-scale efficient layer aggregation network; roadside perception; robot operating system.

Grants and funding

20231j0195/Hubei Provincial Natural Science Foundation