Self-supervised depth super-resolution with contrastive multiview pre-training

Xin Qiao; Chenyang Ge; Chaoqiang Zhao; Fabio Tosi; Matteo Poggi; Stefano Mattoccia

doi:10.1016/j.neunet.2023.09.023

Self-supervised depth super-resolution with contrastive multiview pre-training

Neural Netw. 2023 Nov:168:223-236. doi: 10.1016/j.neunet.2023.09.023. Epub 2023 Sep 21.

Authors

Xin Qiao¹, Chenyang Ge², Chaoqiang Zhao³, Fabio Tosi⁴, Matteo Poggi⁴, Stefano Mattoccia⁴

Affiliations

¹ Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, China.
² Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, 710049, China. Electronic address: cyge@mail.xjtu.edu.cn.
³ National Key Laboratory of Air-based Information Perception and Fusion, Luoyang, 471000, China; Luoyang Institute of Electro-Optical Equipment of Avic, Luoyang, 471000, China.
⁴ Department of Computer Science and Engineering, University of Bologna, Bologna, 40136, Italy.

PMID: 37769459
DOI: 10.1016/j.neunet.2023.09.023

Abstract

Many low-level vision tasks, including guided depth super-resolution (GDSR), struggle with the issue of insufficient paired training data. Self-supervised learning is a promising solution, but it remains challenging to upsample depth maps without the explicit supervision of high-resolution target images. To alleviate this problem, we propose a self-supervised depth super-resolution method with contrastive multiview pre-training. Unlike existing contrastive learning methods for classification or segmentation tasks, our strategy can be applied to regression tasks even when trained on a small-scale dataset and can reduce information redundancy by extracting unique features from the guide. Furthermore, we propose a novel mutual modulation scheme that can effectively compute the local spatial correlation between cross-modal features. Exhaustive experiments demonstrate that our method attains superior performance with respect to state-of-the-art GDSR methods and exhibits good generalization to other modalities.

Keywords: Contrastive pre-training; Cross-modal; Depth super-resolution; Mutual-modulation; Self-supervised learning.

MeSH terms

Neural Networks, Computer*