Video Super-Resolution via a Spatio-Temporal Alignment Network

IEEE Trans Image Process. 2022:31:1761-1773. doi: 10.1109/TIP.2022.3146625. Epub 2022 Feb 8.

Abstract

Deep convolutional neural network based video super-resolution (SR) models have achieved significant progress in recent years. Existing deep video SR methods usually impose optical flow to wrap the neighboring frames for temporal alignment. However, accurate estimation of optical flow is quite difficult, which tends to produce artifacts in the super-resolved results. To address this problem, we propose a novel end-to-end deep convolutional network that dynamically generates the spatially adaptive filters for the alignment, which are constituted by the local spatio-temporal channels of each pixel. Our method avoids generating explicit motion compensation and utilizes spatio-temporal adaptive filters to achieve the operation of alignment, which effectively fuses the multi-frame information and improves the temporal consistency of the video. Capitalizing on the proposed adaptive filter, we develop a reconstruction network and take the aligned frames as input to restore the high-resolution frames. In addition, we employ residual modules embedded with channel attention as the basic unit to extract more informative features for video SR. Both quantitative and qualitative evaluation results on three public video datasets demonstrate that the proposed method performs favorably against state-of-the-art super-resolution methods in terms of clearness and texture details.