FPGA-Based Vehicle Detection and Tracking Accelerator

Jiaqi Zhai; Bin Li; Shunsen Lv; Qinglei Zhou

doi:10.3390/s23042208

FPGA-Based Vehicle Detection and Tracking Accelerator

Sensors (Basel). 2023 Feb 16;23(4):2208. doi: 10.3390/s23042208.

Authors

Jiaqi Zhai¹, Bin Li^{1

2}, Shunsen Lv¹, Qinglei Zhou¹

Affiliations

¹ School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China.
² Henan Key Laboratory of Network Cryptography Technology, Zhengzhou 450001, China.

Abstract

A convolutional neural network-based multiobject detection and tracking algorithm can be applied to vehicle detection and traffic flow statistics, thus enabling smart transportation. Aiming at the problems of the high computational complexity of multiobject detection and tracking algorithms, a large number of model parameters, and difficulty in achieving high throughput with a low power consumption in edge devices, we design and implement a low-power, low-latency, high-precision, and configurable vehicle detector based on a field programmable gate array (FPGA) with YOLOv3 (You-Only-Look-Once-version3), YOLOv3-tiny CNNs (Convolutional Neural Networks), and the Deepsort algorithm. First, we use a dynamic threshold structured pruning method based on a scaling factor to significantly compress the detection model size on the premise that the accuracy does not decrease. Second, a dynamic 16-bit fixed-point quantization algorithm is used to quantify the network parameters to reduce the memory occupation of the network model. Furthermore, we generate a reidentification (RE-ID) dataset from the UA-DETRAC dataset and train the appearance feature extraction network on the Deepsort algorithm to improve the vehicles' tracking performance. Finally, we implement hardware optimization techniques such as memory interlayer multiplexing, parameter rearrangement, ping-pong buffering, multichannel transfer, pipelining, Im2col+GEMM, and Winograd algorithms to improve resource utilization and computational efficiency. The experimental results demonstrate that the compressed YOLOv3 and YOLOv3-tiny network models decrease in size by 85.7% and 98.2%, respectively. The dual-module parallel acceleration meets the demand of the 6-way parallel video stream vehicle detection with the peak throughput at 168.72 fps.

Keywords: DeepSort; FPGA; YOLO; accelerator architecture; vehicle detection.

Abstract

Grants and funding