Point Cloud Registration-Driven Robust Feature Matching for 3-D Siamese Object Tracking

Haobo Jiang; Kaihao Lan; Le Hui; Guangyu Li; Jin Xie; Shangbing Gao; Jian Yang

doi:10.1109/TNNLS.2023.3325286

Point Cloud Registration-Driven Robust Feature Matching for 3-D Siamese Object Tracking

IEEE Trans Neural Netw Learn Syst. 2023 Nov 13:PP. doi: 10.1109/TNNLS.2023.3325286. Online ahead of print.

Authors

Haobo Jiang, Kaihao Lan, Le Hui, Guangyu Li, Jin Xie, Shangbing Gao, Jian Yang

PMID: 37956012
DOI: 10.1109/TNNLS.2023.3325286

Abstract

Learning robust feature matching between the template and search area is crucial for 3-D Siamese tracking. The core of Siamese feature matching is how to assign high feature similarity to the corresponding points between the template and the search area for precise object localization. In this article, we propose a novel point cloud registration-driven Siamese tracking framework, with the intuition that spatially aligned corresponding points (via 3-D registration) tend to achieve consistent feature representations. Specifically, our method consists of two modules, including a tracking-specific nonlocal registration (TSNR) module and a registration-aided Sinkhorn template-feature aggregation module. The registration module targets the precise spatial alignment between the template and the search area. The tracking-specific spatial distance constraint is proposed to refine the cross-attention weights in the nonlocal module for discriminative feature learning. Then, we use the weighted singular value decomposition (SVD) to compute the rigid transformation between the template and the search area and align them to achieve the desired spatially aligned corresponding points. For the feature aggregation model, we formulate the feature matching between the transformed template and the search area as an optimal transport problem and utilize the Sinkhorn optimization to search for the outlier-robust matching solution. Also, a registration-aided spatial distance map is built to improve the matching robustness in indistinguishable regions (e.g., smooth surfaces). Finally, guided by the obtained feature matching map, we aggregate the target information from the template into the search area to construct the target-specific feature, which is then fed into a CenterPoint-like detection head for object localization. Extensive experiments on KITTI, NuScenes, and Waymo datasets verify the effectiveness of our proposed method.