CSMOT: Make One-Shot Multi-Object Tracking in Crowded Scenes Great Again

Sensors (Basel). 2023 Apr 6;23(7):3782. doi: 10.3390/s23073782.

Abstract

The current popular one-shot multi-object tracking (MOT) algorithms are dominated by the joint detection and embedding paradigm, which have high inference speeds and accuracy, but their tracking performance is unstable in crowded scenes. Not only does the detection branch have difficulty in obtaining the accurate object position, but the ambiguous appearance of features extracted by the re-identification (re-ID) branch also leads to identity switches. Focusing on the above problems, this paper proposes a more robust MOT algorithm, named CSMOT, based on FairMOT. First, on the basis of the encoder-decoder network, a coordinate attention module is designed to enhance the information interaction between channels (horizontal and vertical coordinates), which improves its object-detection abilities. Then, an angle-center loss that effectively maximizes intra-class similarity is proposed to optimize the re-ID branch, and the extracted re-ID features are made more discriminative. We further redesign the re-ID feature dimension to balance the detection and re-ID tasks. Finally, a simple and effective data association mechanism is introduced, which associates each detection instead of just the high-score detections during the tracking process. The experimental results show that our one-shot MOT algorithm achieves excellent tracking performance on multiple public datasets and can be effectively applied to crowded scenes. In particular, CSMOT decreases the number of ID switches by 11.8% and 33.8% on the MOT16 and MOT17 test datasets, respectively, compared to the baseline.

Keywords: angle-center loss; coordinate attention; data association; multi-object tracking; one-shot; re-ID.