Camera-Assisted Video Saliency Prediction and Its Applications

IEEE Trans Cybern. 2018 Sep;48(9):2520-2530. doi: 10.1109/TCYB.2017.2741498. Epub 2017 Dec 21.

Abstract

Video saliency prediction is an indispensable yet challenging technique which can facilitate various applications, such as video surveillance, autonomous driving, and realistic rendering. Based on the popularity of embedded cameras, we in this paper predict region-level saliency from videos by leveraging human gaze locations recorded using a camera, (e.g., those equipped on an iMAC and laptop PC). Our proposed camera-assisted mechanism improves saliency prediction by discovering human attended regions inside a video clip. It is orthogonal to the current saliency models, i.e., any existing video/image saliency model can be boosted by our mechanism. First of all, the spatial-and temporal-level visual features are exploited collaboratively for calculating an initial saliency map. We notice that the current saliency models are not sufficiently adaptable to the variations in lighting, different view angles, and complicated backgrounds. Therefore, assisted by a camera tracking human gaze movements, a non-negative matrix factorization algorithm is designed to accurately localize the semantically/visually salient video regions perceived by humans. Finally, the learned human gaze locations as well as the initial saliency map are integrated to optimize video saliency calculation. Empirical results thoroughly demonstrated that: 1) our approach achieves the state-of-the-art video saliency prediction accuracy by outperforming 11 mainstream algorithms considerably and 2) our method can conveniently and successfully enhance video retargeting, quality estimation, and summarization.

MeSH terms

  • Algorithms
  • Fixation, Ocular / physiology*
  • Humans
  • Image Processing, Computer-Assisted / methods*
  • Models, Statistical
  • Video Recording / methods*