A Video Summarization Model Based on Deep Reinforcement Learning with Long-Term Dependency

Sensors (Basel). 2022 Oct 10;22(19):7689. doi: 10.3390/s22197689.

Abstract

Deep summarization models have succeeded in the video summarization field based on the development of gated recursive unit (GRU) and long and short-term memory (LSTM) technology. However, for some long videos, GRU and LSTM cannot effectively capture long-term dependencies. This paper proposes a deep summarization network with auxiliary summarization losses to address this problem. We introduce an unsupervised auxiliary summarization loss module with LSTM and a swish activation function to capture the long-term dependencies for video summarization, which can be easily integrated with various networks. The proposed model is an unsupervised framework for deep reinforcement learning that does not depend on any labels or user interactions. Additionally, we implement a reward function (R(S)) that jointly considers the consistency, diversity, and representativeness of generated summaries. Furthermore, the proposed model is lightweight and can be successfully deployed on mobile devices and enhance the experience of mobile users and reduce pressure on server operations. We conducted experiments on two benchmark datasets and the results demonstrate that our proposed unsupervised approach can obtain better summaries than existing video summarization methods. Furthermore, the proposed algorithm can generate higher F scores with a nearly 6.3% increase on the SumMe dataset and a 2.2% increase on the TVSum dataset compared to the DR-DSN model.

Keywords: auxiliary summarization loss; long-term dependency; reinforcement learning; unsupervised learning; video summarization.

MeSH terms

  • Algorithms*
  • Memory, Long-Term* / physiology