Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information

Henan Hu; Ming Zhu; Muyu Li; Kwok-Leung Chan

doi:10.3390/s22072576

Deep Learning-Based Monocular 3D Object Detection with Refinement of Depth Information

Sensors (Basel). 2022 Mar 28;22(7):2576. doi: 10.3390/s22072576.

Authors

Henan Hu^{1

2

3}, Ming Zhu¹, Muyu Li⁴, Kwok-Leung Chan³

Affiliations

¹ Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China.
² University of Chinese Academy of Sciences, Beijing 100049, China.
³ Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China.
⁴ Centre for Intelligent Multidimensional Data Analysis Limited, Hong Kong, China.

Abstract

Recently, the research on monocular 3D target detection based on pseudo-LiDAR data has made some progress. In contrast to LiDAR-based algorithms, the robustness of pseudo-LiDAR methods is still inferior. After conducting in-depth experiments, we realized that the main limitations are due to the inaccuracy of the target position and the uncertainty in the depth distribution of the foreground target. These two problems arise from the inaccurate depth estimation. To deal with the aforementioned problems, we propose two innovative solutions. The first is a novel method based on joint image segmentation and geometric constraints, used to predict the target depth and provide the depth prediction confidence measure. The predicted target depth is fused with the overall depth of the scene and results in the optimal target position. For the second, we utilize the target scale, normalized with the Gaussian function, as a priori information. The uncertainty of depth distribution, which can be visualized as long-tail noise, is reduced. With the refined depth information, we convert the optimized depth map into the point cloud representation, called a pseudo-LiDAR point cloud. Finally, we input the pseudo-LiDAR point cloud to the LiDAR-based algorithm to detect the 3D target. We conducted extensive experiments on the challenging KITTI dataset. The results demonstrate that our proposed framework outperforms various state-of-the-art methods by more than 12.37% and 5.34% on the easy and hard settings of the KITTI validation subset, respectively. On the KITTI test set, our framework also outperformed state-of-the-art methods by 5.1% and 1.76% on the easy and hard settings, respectively.

Keywords: 3D object detection; autonomous driving; deep learning; depth estimation; monocular image; point cloud.

MeSH terms

Algorithms
Deep Learning*
Research Design

Grants and funding

CityU 11202319/Research Grants Council of the Hong Kong Special Administrative Region