Antidecay LSTM for Siamese Tracking With Adversarial Learning

IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4475-4489. doi: 10.1109/TNNLS.2020.3018025. Epub 2021 Oct 5.

Abstract

Visual tracking is one of the fundamental tasks in computer vision with many challenges, and it is mainly due to the changes in the target's appearance in temporal and spatial domains. Recently, numerous trackers model the appearance of the targets in the spatial domain well by utilizing deep convolutional features. However, most of these CNN-based trackers only take the appearance variations between two consecutive frames in a video sequence into consideration. Besides, some trackers model the appearance of the targets in the long term by applying RNN, but the decay of the target's features degrades the tracking performance. In this article, we propose the antidecay long short-term memory (AD-LSTM) for the Siamese tracking. Especially, we extend the architecture of the standard LSTM in two aspects for the visual tracking task. First, we replace all of the fully connected layers with convolutional layers to extract the features with spatial structure. Second, we improve the architecture of the cell unit. In this way, the information of the target appearance can flow through the AD-LSTM without decay as long as possible in the temporal domain. Meanwhile, since there is no ground truth for the feature maps generated by the AD-LSTM, we propose an adversarial learning algorithm to optimize the AD-LSTM. With the help of adversarial learning, the Siamese network can generate the response maps more accurately, and the AD-LSTM can generate the feature maps of the target more robustly. The experimental results show that our tracker performs favorably against the state-of-the-art trackers on six challenging benchmarks: OTB-100, TC-128, VOT2016, VOT2017, GOT-10k, and TrackingNet.