Adversarial Learning for Joint Optimization of Depth and Ego-Motion

Anjie Wang; Zhijun Fang; Yongbin Gao; Songchao Tan; Shanshe Wang; Siwei Ma; Jenq-Neng Hwang

doi:10.1109/TIP.2020.2968751

Adversarial Learning for Joint Optimization of Depth and Ego-Motion

IEEE Trans Image Process. 2020 Jan 28. doi: 10.1109/TIP.2020.2968751. Online ahead of print.

Authors

Anjie Wang, Zhijun Fang, Yongbin Gao, Songchao Tan, Shanshe Wang, Siwei Ma, Jenq-Neng Hwang

PMID: 32011252
DOI: 10.1109/TIP.2020.2968751

Abstract

In recent years, supervised deep learning methods have shown a great promise in dense depth estimation. However, massive high-quality training data are expensive and impractical to acquire. Alternatively, self-supervised learning-based depth estimators can learn the latent transformation from monocular or binocular video sequences by minimizing the photometric warp error between consecutive frames, but they suffer from the scale ambiguity problem or have difficulty in estimating precise pose changes between frames. In this paper, we propose a joint self-supervised deep learning pipeline for depth and ego-motion estimation by employing the advantages of adversarial learning and joint optimization with spatial-temporal geometrical constraints. The stereo reconstruction error provides the spatial geometric constraint to estimate the absolute scale depth. Meanwhile, the depth map with an absolute scale and a pre-trained pose network serves as a good starting point for direct visual odometry (DVO). DVO optimization based on spatial geometric constraints can result in a fine-grained ego-motion estimation with the additional backpropagation signals provided to the depth estimation network. Finally, the spatial and temporal domain-based reconstructed views are concatenated, and the iterative coupling optimization process is implemented in combination with the adversarial learning for accurate depth and precise ego-motion estimation. The experimental results show superior performance compared with state-of-the-art methods for monocular depth and ego-motion estimation on the KITTI dataset and a great generalization ability of the proposed approach.