Learning-Based Just-Noticeable-Quantization- Distortion Modeling for Perceptual Video Coding

Sehwan Ki; Sung-Ho Bae; Munchurl Kim; Hyunsuk Ko

doi:10.1109/TIP.2018.2818439

Learning-Based Just-Noticeable-Quantization- Distortion Modeling for Perceptual Video Coding

IEEE Trans Image Process. 2018 Jul;27(7):3178-3193. doi: 10.1109/TIP.2018.2818439.

Authors

Sehwan Ki, Sung-Ho Bae, Munchurl Kim, Hyunsuk Ko

PMID: 29641399
DOI: 10.1109/TIP.2018.2818439

Abstract

Conventional predictive video coding-based approaches are reaching the limit of their potential coding efficiency improvements, because of severely increasing computation complexity. As an alternative approach, perceptual video coding (PVC) has attempted to achieve high coding efficiency by eliminating perceptual redundancy, using just-noticeable-distortion (JND) directed PVC. The previous JNDs were modeled by adding white Gaussian noise or specific signal patterns into the original images, which were not appropriate in finding JND thresholds due to distortion with energy reduction. In this paper, we present a novel discrete cosine transform-based energy-reduced JND model, called ERJND, that is more suitable for JND-based PVC schemes. Then, the proposed ERJND model is extended to two learning-based just-noticeable-quantization-distortion (JNQD) models as preprocessing that can be applied for perceptual video coding. The two JNQD models can automatically adjust JND levels based on given quantization step sizes. One of the two JNQD models, called LR-JNQD, is based on linear regression and determines the model parameter for JNQD based on extracted handcraft features. The other JNQD model is based on a convolution neural network (CNN), called CNN-JNQD. To our best knowledge, our paper is the first approach to automatically adjust JND levels according to quantization step sizes for preprocessing the input to video encoders. In experiments, both the LR-JNQD and CNN-JNQD models were applied to high efficiency video coding (HEVC) and yielded maximum (average) bitrate reductions of 38.51% (10.38%) and 67.88% (24.91%), respectively, with little subjective video quality degradation, compared with the input without preprocessing applied.