Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

Zhilu Zhang; Ruohao Wang; Hongzhi Zhang; Wangmeng Zuo

doi:10.1109/TPAMI.2024.3379736

Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

IEEE Trans Pattern Anal Mach Intell. 2024 Mar 20:PP. doi: 10.1109/TPAMI.2024.3379736. Online ahead of print.

Authors

Zhilu Zhang, Ruohao Wang, Hongzhi Zhang, Wangmeng Zuo

PMID: 38507386
DOI: 10.1109/TPAMI.2024.3379736

Abstract

In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms. Firstly, considering the popularity of multiple cameras in modern smartphones, the more zoomed (telephoto) image can be naturally leveraged as the reference to guide the super-resolution (SR) of the lesser zoomed (ultra-wide) image, which gives us a chance to learn a deep network that performs SR from the dual zoomed observations (DZSR). Secondly, for self-supervised learning of DZSR, we take the telephoto image instead of an additional high-resolution image as the supervision information, and select a center patch from it as the reference to super-resolve the corresponding ultra-wide image patch. To mitigate the effect of the misalignment between ultra-wide low-resolution (LR) patch and telephoto ground-truth (GT) image during training, we first adopt patch-based optical flow alignment to obtain the warped LR, then further design an auxiliary-LR to guide the deforming of the warped LR features. To generate visually pleasing results, we present local overlapped sliced Wasserstein loss to better represent the perceptual difference between GT and output in the feature space. During testing, DZSR can be directly deployed to super-solve the whole ultra-wide image with the reference of the telephoto image. In addition, we further take multiple zoomed observations to explore self-supervised RefSR, and present a progressive fusion scheme for the effective utilization of reference images. Experiments show that our methods achieve better quantitative and qualitative performance against state-of-the-arts. The code and pre-trained models will be publicly available.