Single image super-resolution with denoising diffusion GANS

Heng Xiao; Xin Wang; Jun Wang; Jing-Ye Cai; Jian-Hua Deng; Jing-Ke Yan; Yi-Dong Tang

doi:10.1038/s41598-024-52370-3

Single image super-resolution with denoising diffusion GANS

Sci Rep. 2024 Feb 21;14(1):4272. doi: 10.1038/s41598-024-52370-3.

Authors

Heng Xiao¹, Xin Wang^{2

3

4}, Jun Wang⁵, Jing-Ye Cai⁶, Jian-Hua Deng⁶, Jing-Ke Yan^{7

8}, Yi-Dong Tang¹

Affiliations

¹ School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, Guangxi, China.
² School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, Guangxi, China. xh18784032229@gmail.com.
³ School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610000, Sichuan, China. xh18784032229@gmail.com.
⁴ School of Ocean Engineering, Guilin University of Electronic Technology, BeiHai, 536000, Guangxi, China. xh18784032229@gmail.com.
⁵ School of Ocean Engineering, Guilin University of Electronic Technology, BeiHai, 536000, Guangxi, China. 9327362@qq.com.
⁶ School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610000, Sichuan, China.
⁷ School of Ocean Engineering, Guilin University of Electronic Technology, BeiHai, 536000, Guangxi, China.
⁸ State Key Laboratory of Rail Transit Vehicle System, Southwest Jiaotong University, Chengdu, 610000, Sichuan, China.

PMID: 38383573
DOI: 10.1038/s41598-024-52370-3

Abstract

Single image super-resolution (SISR) refers to the reconstruction from the corresponding low-resolution (LR) image input to a high-resolution (HR) image. However, since a single low-resolution image corresponds to multiple high-resolution images, this is an ill-posed problem. In recent years, generative model-based SISR methods have outperformed conventional SISR methods in performance. However, the SISR methods based on GAN, VAE, and Flow have the problems of unstable training, low sampling quality, and expensive computational cost. These models also struggle to achieve the trifecta of diverse, high-quality, and fast sampling. In particular, denoising diffusion probabilistic models have shown impressive variety and high quality of samples, but their expensive sampling cost prevents them from being well applied in the real world. In this paper, we investigate the fundamental reason for the slow sampling speed of the SISR method based on the diffusion model lies in the Gaussian assumption used in the previous diffusion model, which is only applicable for small step sizes. We propose a new Single Image Super-Resolution with Denoising Diffusion GANS (SRDDGAN) to achieve large-step denoising, sample diversity, and training stability. Our approach combines denoising diffusion models with GANs to generate images conditionally, using a multimodal conditional GAN to model each denoising step. SRDDGAN outperforms existing diffusion model-based methods regarding PSNR and perceptual quality metrics, while the added latent variable Z solution explores the diversity of likely HR spatial domain. Notably, the SRDDGAN model infers nearly 11 times faster than diffusion-based SR3, making it a more practical solution for real-world applications.

Abstract

Grants and funding