Learning Self-Corrective Network via Adaptive Self-Labeling and Dynamic NMS for High-Performance Long-Term Tracking

IEEE Trans Neural Netw Learn Syst. 2023 Nov 7:PP. doi: 10.1109/TNNLS.2023.3327486. Online ahead of print.

Abstract

This article presents a self-corrective network-based long-term tracker (SCLT) including a self-modulated tracking reliability evaluator (STRE) and a self-adjusting proposal postprocessor (SPPP). The targets in the long-term sequences often suffer from severe appearance variations. Existing long-term trackers often online update their models to adapt the variations, but the inaccurate tracking results introduce cumulative error into the updated model that may cause severe drift issue. To this end, a robust long-term tracker should have the self-corrective capability that can judge whether the tracking result is reliable or not, and then it is able to recapture the target when severe drift happens caused by serious challenges (e.g., full occlusion and out-of-view). To address the first issue, the STRE designs an effective tracking reliability classifier that is built on a modulation subnetwork. The classifier is trained using the samples with pseudo labels generated by an adaptive self-labeling strategy. The adaptive self-labeling can automatically label the hard negative samples that are often neglected in existing trackers according to the statistical characteristics of target state, and the network modulation mechanism can guide the backbone network to learn more discriminative features without extra training data. To address the second issue, after the STRE has been triggered, the SPPP follows it with a dynamic NMS to recapture the target in time and accurately. In addition, the STRE and the SPPP demonstrate good transportability ability, and their performance is improved when combined with multiple baselines. Compared to the commonly used greedy NMS, the proposed dynamic NMS leverages an adaptive strategy to effectively handle the different conditions of in view and out of view, thereby being able to select the most probable object box that is essential to accurately online update the basic tracker. Extensive evaluations on four large-scale and challenging benchmark datasets including VOT2021LT, OxUvALT, TLP, and LaSOT demonstrate superiority of the proposed SCLT to a variety of state-of-the-art long-term trackers in terms of all measures. Source codes and demos can be found at https://github.com/TJUT-CV/SCLT.