Depth Map Upsampling via Multi-Modal Generative Adversarial Network

Daniel Stanley Tan; Jun-Ming Lin; Yu-Chi Lai; Joel Ilao; Kai-Lung Hua

doi:10.3390/s19071587

Depth Map Upsampling via Multi-Modal Generative Adversarial Network

Sensors (Basel). 2019 Apr 2;19(7):1587. doi: 10.3390/s19071587.

Authors

Daniel Stanley Tan¹, Jun-Ming Lin², Yu-Chi Lai³, Joel Ilao⁴, Kai-Lung Hua^{5

6}

Affiliations

¹ Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan. D10515805@mail.ntust.edu.tw.
² Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan. M10515002@mail.ntust.edu.tw.
³ Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan. yu-chi@mail.ntust.edu.tw.
⁴ Center for Automation Research, College of Computer Studies, De La Salle University, Manila 1004, Philippines. joel.ilao@dlsu.edu.ph.
⁵ Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan. hua@mail.ntust.edu.tw.
⁶ Center for Cyber-Physical System Innovation, National Taiwan University of Science and Technology, Taipei 10607, Taiwan. hua@mail.ntust.edu.tw.

Abstract

Autonomous robots for smart homes and smart cities mostly require depth perception in order to interact with their environments. However, depth maps are usually captured in a lower resolution as compared to RGB color images due to the inherent limitations of the sensors. Naively increasing its resolution often leads to loss of sharpness and incorrect estimates, especially in the regions with depth discontinuities or depth boundaries. In this paper, we propose a novel Generative Adversarial Network (GAN)-based framework for depth map super-resolution that is able to preserve the smooth areas, as well as the sharp edges at the boundaries of the depth map. Our proposed model is trained on two different modalities, namely color images and depth maps. However, at test time, our model only requires the depth map in order to produce a higher resolution version. We evaluated our model both quantitatively and qualitatively, and our experiments show that our method performs better than existing state-of-the-art models.

Keywords: depth upsampling; encoder-decoder networks; generative adversarial networks.

Abstract

Grants and funding