NRVC: Neural Representation for Video Compression with Implicit Multiscale Fusion Network

Shangdong Liu; Puming Cao; Yujian Feng; Yimu Ji; Jiayuan Chen; Xuedong Xie; Longji Wu

doi:10.3390/e25081167

NRVC: Neural Representation for Video Compression with Implicit Multiscale Fusion Network

Entropy (Basel). 2023 Aug 4;25(8):1167. doi: 10.3390/e25081167.

Authors

Shangdong Liu¹, Puming Cao¹, Yujian Feng¹, Yimu Ji¹, Jiayuan Chen¹, Xuedong Xie¹, Longji Wu¹

Affiliation

¹ School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.

Abstract

Recently, end-to-end deep models for video compression have made steady advancements. However, this resulted in a lengthy and complex pipeline containing numerous redundant parameters. The video compression approaches based on implicit neural representation (INR) allow videos to be directly represented as a function approximated by a neural network, resulting in a more lightweight model, whereas the singularity of the feature extraction pipeline limits the network's ability to fit the mapping function for video frames. Hence, we propose a neural representation approach for video compression with an implicit multiscale fusion network (NRVC), utilizing normalized residual networks to improve the effectiveness of INR in fitting the target function. We propose the multiscale representations for video compression (MSRVC) network, which effectively extracts features from the input video sequence to enhance the degree of overfitting in the mapping function. Additionally, we propose the feature extraction channel attention (FECA) block to capture interaction information between different feature extraction channels, further improving the effectiveness of feature extraction. The results show that compared to the NeRV method with similar bits per pixel (BPP), NRVC has a 2.16% increase in the decoded peak signal-to-noise ratio (PSNR). Moreover, NRVC outperforms the conventional HEVC in terms of PSNR.

Keywords: attention mechanism; implicit neural representation; video compression.

Grants and funding

This project is funded by National Natural Science Foundation of China (No. 62176264), Natural Science Foundation of Jiangsu Province (Higher Education Institutions) (20KJA520001), and Open Research Project of Zhejiang Lab (No. 2021KF0AB05).