Stereoscopic Vision Recalling Memory for Monocular 3D Object Detection

IEEE Trans Image Process. 2023:32:2749-2760. doi: 10.1109/TIP.2023.3274479. Epub 2023 May 19.

Abstract

Monocular 3D object detection has drawn increasing attention in various human-related applications, such as autonomous vehicles, due to its cost-effective property. On the other hand, a monocular image alone inherently contains insufficient information to infer the 3D information. In this paper, we propose a new monocular 3D object detector that can recall the stereoscopic visual information about an object, given a left-view monocular image. Here, we devise a location embedding module to handle each object by being aware of its location. Next, given the object appearance of the left-view monocular image, we devise Monocular-to-Stereoscopic (M2S) memory that can recall the object appearance of the right-view and depth information. For this purpose, we introduce a stereoscopic vision memorizing loss that guides the M2S memory to store the stereoscopic visual information. Furthermore, we propose a binocular vision association loss to guide the M2S memory that can associate the information of the left-right view about the object when estimating the depth. As a result, our monocular 3D object detector with the M2S memory can effectively exploit the recalled stereoscopic visual information in the inference phase. The comprehensive experimental results on two public datasets, KITTI 3D Object Detection Benchmark and Waymo Open Dataset, demonstrate the effectiveness of the proposed method. We claim that our method is a step-forward method that follows the behaviors of humans that can recall the stereoscopic visual information even when one eye is closed.