RepECN: Making ConvNets Better Again for Efficient Image Super-Resolution

Qiangpu Chen; Jinghui Qin; Wushao Wen

doi:10.3390/s23239575

RepECN: Making ConvNets Better Again for Efficient Image Super-Resolution

Sensors (Basel). 2023 Dec 2;23(23):9575. doi: 10.3390/s23239575.

Authors

Qiangpu Chen¹, Jinghui Qin², Wushao Wen¹

Affiliations

¹ School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China.
² School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China.

Abstract

Traditional Convolutional Neural Network (ConvNet, CNN)-based image super-resolution (SR) methods have lower computation costs, making them more friendly for real-world scenarios. However, they suffer from lower performance. On the contrary, Vision Transformer (ViT)-based SR methods have achieved impressive performance recently, but these methods often suffer from high computation costs and model storage overhead, making them hard to meet the requirements in practical application scenarios. In practical scenarios, an SR model should reconstruct an image with high quality and fast inference. To handle this issue, we propose a novel CNN-based Efficient Residual ConvNet enhanced with structural Re-parameterization (RepECN) for a better trade-off between performance and efficiency. A stage-to-block hierarchical architecture design paradigm inspired by ViT is utilized to keep the state-of-the-art performance, while the efficiency is ensured by abandoning the time-consuming Multi-Head Self-Attention (MHSA) and by re-designing the block-level modules based on CNN. Specifically, RepECN consists of three structural modules: a shallow feature extraction module, a deep feature extraction, and an image reconstruction module. The deep feature extraction module comprises multiple ConvNet Stages (CNS), each containing 6 Re-Parameterization ConvNet Blocks (RepCNB), a head layer, and a residual connection. The RepCNB utilizes larger kernel convolutions rather than MHSA to enhance the capability of learning long-range dependence. In the image reconstruction module, an upsampling module consisting of nearest-neighbor interpolation and pixel attention is deployed to reduce parameters and maintain reconstruction performance, while bicubic interpolation on another branch allows the backbone network to focus on learning high-frequency information. The extensive experimental results on multiple public benchmarks show that our RepECN can achieve 2.5∼5× faster inference than the state-of-the-art ViT-based SR model with better or competitive super-resolving performance, indicating that our RepECN can reconstruct high-quality images with fast inference.

Keywords: ConvNet; image super-resolution; structural re-parameterization.

Abstract

Grants and funding