Action-Driven Visual Object Tracking With Deep Reinforcement Learning

IEEE Trans Neural Netw Learn Syst. 2018 Jun;29(6):2239-2252. doi: 10.1109/TNNLS.2018.2801826.

Abstract

In this paper, we propose an efficient visual tracker, which directly captures a bounding box containing the target object in a video by means of sequential actions learned using deep neural networks. The proposed deep neural network to control tracking actions is pretrained using various training video sequences and fine-tuned during actual tracking for online adaptation to a change of target and background. The pretraining is done by utilizing deep reinforcement learning (RL) as well as supervised learning. The use of RL enables even partially labeled data to be successfully utilized for semisupervised learning. Through the evaluation of the object tracking benchmark data set, the proposed tracker is validated to achieve a competitive performance at three times the speed of existing deep network-based trackers. The fast version of the proposed method, which operates in real time on graphics processing unit, outperforms the state-of-the-art real-time trackers with an accuracy improvement of more than 8%.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Deep Learning*
  • Humans
  • Nonlinear Dynamics
  • Pattern Recognition, Automated
  • Reinforcement, Psychology*
  • Video Recording
  • Visual Perception / physiology*