Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation

Sensors (Basel). 2021 Mar 17;21(6):2113. doi: 10.3390/s21062113.

Abstract

Autonomous systems need to localize and track surrounding objects in 3D space for safe motion planning. As a result, 3D multi-object tracking (MOT) plays a vital role in autonomous navigation. Most MOT methods use a tracking-by-detection pipeline, which includes both the object detection and data association tasks. However, many approaches detect objects in 2D RGB sequences for tracking, which lacks reliability when localizing objects in 3D space. Furthermore, it is still challenging to learn discriminative features for temporally consistent detection in different frames, and the affinity matrix is typically learned from independent object features without considering the feature interaction between detected objects in the different frames. To settle these problems, we first employ a joint feature extractor to fuse the appearance feature and the motion feature captured from 2D RGB images and 3D point clouds, and then we propose a novel convolutional operation, named RelationConv, to better exploit the correlation between each pair of objects in the adjacent frames and learn a deep affinity matrix for further data association. We finally provide extensive evaluation to reveal that our proposed model achieves state-of-the-art performance on the KITTI tracking benchmark.

Keywords: 3D multi-object tracking; deep affinity; neural network; relation learning; sensor fusion.