MME-YOLO: Multi-Sensor Multi-Level Enhanced YOLO for Robust Vehicle Detection in Traffic Surveillance

Jianxiao Zhu; Xu Li; Peng Jin; Qimin Xu; Zhengliang Sun; Xiang Song

doi:10.3390/s21010027

MME-YOLO: Multi-Sensor Multi-Level Enhanced YOLO for Robust Vehicle Detection in Traffic Surveillance

Sensors (Basel). 2020 Dec 23;21(1):27. doi: 10.3390/s21010027.

Authors

Jianxiao Zhu¹, Xu Li¹, Peng Jin¹, Qimin Xu¹, Zhengliang Sun², Xiang Song³

Affiliations

¹ School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China.
² Traffic Management Research Institute, Ministry of Public Security, Wuxi 214151, China.
³ School of Electronic Engineering, Nanjing Xiaozhuang University, Nanjing 211171, China.

Abstract

As an effective means of solving collision problems caused by the limited perspective on board, the cooperative roadside system is gaining popularity. To improve the vehicle detection abilities in such online safety systems, in this paper, we propose a novel multi-sensor multi-level enhanced convolutional network model, called multi-sensor multi-level enhanced convolutional network architecture (MME-YOLO), with consideration of hybrid realistic scene of scales, illumination, and occlusion. MME-YOLO consists of two tightly coupled structures, i.e., the enhanced inference head and the LiDAR-Image composite module. More specifically, the enhanced inference head preliminarily equips the network with stronger inference abilities for redundant visual cues by attention-guided feature selection blocks and anchor-based/anchor-free ensemble head. Furthermore, the LiDAR-Image composite module cascades the multi-level feature maps from the LiDAR subnet to the image subnet, which strengthens the generalization of the detector in complex scenarios. Compared with YOLOv3, the enhanced inference head achieves a 5.83% and 4.88% mAP improvement on visual dataset LVSH and UA-DETRAC, respectively. Integrated with the composite module, the overall architecture gains 91.63% mAP in the collected Road-side Dataset. Experiments show that even under the abnormal lightings and the inconsistent scales at evening rush hours, the proposed MME-YOLO maintains reliable recognition accuracy and robust detection performance.

Keywords: complex scenes; multi-scales; multi-sensor fusion; smart city; vehicle detection.

Abstract

Grants and funding