Improving Depth Estimation by Embedding Semantic Segmentation: A Hybrid CNN Model

Sensors (Basel). 2022 Feb 21;22(4):1669. doi: 10.3390/s22041669.

Abstract

Single image depth estimation works fail to separate foreground elements because they can easily be confounded with the background. To alleviate this problem, we propose the use of a semantic segmentation procedure that adds information to a depth estimator, in this case, a 3D Convolutional Neural Network (CNN)-segmentation is coded as one-hot planes representing categories of objects. We explore 2D and 3D models. Particularly, we propose a hybrid 2D-3D CNN architecture capable of obtaining semantic segmentation and depth estimation at the same time. We tested our procedure on the SYNTHIA-AL dataset and obtained σ3=0.95, which is an improvement of 0.14 points (compared with the state of the art of σ3=0.81) by using manual segmentation, and σ3=0.89 using automatic semantic segmentation, proving that depth estimation is improved when the shape and position of objects in a scene are known.

Keywords: 3D CNN; depth estimation; hybrid convolutional neural networks; semantic segmentation.

MeSH terms

  • Image Processing, Computer-Assisted*
  • Neural Networks, Computer
  • Semantics*