SASAN: Shape-Adaptive Set Abstraction Network for Point-Voxel 3D Object Detection

IEEE Trans Neural Netw Learn Syst. 2023 Dec 22:PP. doi: 10.1109/TNNLS.2023.3339889. Online ahead of print.

Abstract

Point-voxel 3D object detectors have achieved impressive performance in complex traffic scenes. However, they utilize the 3D sparse convolution (spconv) layers with fixed receptive fields, such as voxel-based detectors, and inherit the fixed sphere radius from point-based methods for generating the features of keypoints, which make them weak in adaptively modeling various geometrical deformations and sizes of real objects. To tackle this issue, we propose a shape-adaptive set abstraction network (SASAN) for point-voxel 3D object detection. First, the proposal and offset generation module is adopted to learn the coordinates and confidences of 3D proposals and shape-adaptive offsets of the certain number of offset points for each voxel. Meanwhile, an extra offset supervision task is employed to guide the learning of shifting values of offset points, aiming at motivating the predicted offsets to preferably adapt to the various shapes of objects. Then, the shape-adaptive set abstraction module is proposed to extract multiscale keypoints features by grouping the neighboring offset points' features, as well as features learned from adjacent raw points and the 2-D bird-view map. Finally, the region of interest (RoI)-grid proposal refinement module is used to aggregate the keypoints features for further proposal refinement and confidence prediction. Extensive experiments on the competitive KITTI 3D detection benchmark demonstrate that the proposed SASAN gains superior performance as compared with state-of-the-art methods.