Wavelet Pyramid Recurrent Structure-Preserving Attention Network for Single Image Super-Resolution

IEEE Trans Neural Netw Learn Syst. 2023 Jul 13:PP. doi: 10.1109/TNNLS.2023.3289958. Online ahead of print.

Abstract

Many single image super-resolution (SISR) methods that use convolutional neural networks (CNNs) learn the relationship between low-and high-resolution images directly, without considering the context structure and detail fidelity. This can limit the potential of CNNs and result in unrealistic, distorted edges and textures in the reconstructed images. A more effective approach is to incorporate prior knowledge about the image into the model to aid in image reconstruction. In this study, we propose a novel recurrent structure-preserving mechanism that innovatively uses the multiscale wavelet transform (WT) as an image prior, namely, wavelet pyramid recurrent structure-preserving attention network (WRSANet), to process both low-and high-frequency subnetworks at each level separately and recursively. We propose a novel structure scale preservation (SSP) architecture that differs from traditional WTs. This architecture allows us to incorporate and learn structure preservation subnetworks at each level. By using our proposed structure scale fusion (SSF) combined with inverse WT, we can recursively restore and preserve rich low-frequency image structure through the combination of SSP at various levels. Furthermore, we also propose novel low-to-high-frequency information transmission (L2HIT) and detail enhancement (DE) mechanisms to address the issue of detail distortion in high-frequency images by transferring information from structure preservation subnetworks. This allows us to preserve the low-frequency structure while reconstructing high-frequency details, improving detail fidelity and avoiding structural distortion. Finally, a joint loss function is also used to balance the fusion of low-and high-frequency information at different degrees, with hyperparameters being adjusted during training. The experimental results demonstrate that the proposed WRSANet achieves better performance and visual presentation than the state-of-the-art (SOTA) on synthetic and real datasets, especially in terms of context structure and texture details.