Learned Video Compression With Efficient Temporal Context Learning

IEEE Trans Image Process. 2023:32:3188-3198. doi: 10.1109/TIP.2023.3276333. Epub 2023 Jun 2.

Abstract

In contrast to image compression, the key of video compression is to efficiently exploit the temporal context for reducing the inter-frame redundancy. Existing learned video compression methods generally rely on utilizing short-term temporal correlations or image-oriented codecs, which prevents further improvement of the coding performance. This paper proposed a novel temporal context-based video compression network (TCVC-Net) for improving the performance of learned video compression. Specifically, a global temporal reference aggregation (GTRA) module is proposed to obtain an accurate temporal reference for motion-compensated prediction by aggregating long-term temporal context. Furthermore, in order to efficiently compress the motion vector and residue, a temporal conditional codec (TCC) is proposed to preserve structural and detailed information by exploiting the multi-frequency components in temporal context. Experimental results show that the proposed TCVC-Net outperforms public state-of-the-art methods in terms of both PSNR and MS-SSIM metrics.