Exploring High-order Spatio-temporal Correlations from Skeleton for Person Re-identification

IEEE Trans Image Process. 2023 Jan 17:PP. doi: 10.1109/TIP.2023.3236144. Online ahead of print.

Abstract

Person re-identification (Re-ID) has become a hot research topic due to its widespread applications. Conducting person Re-ID in video sequences is a practical requirement, in which the crucial challenge is how to pursue a robust video representation based on spatial and temporal features. However, most of the previous methods only consider how to integrate part-level features in the spatio-temporal range, while how to model and generate the part-correlations is little exploited. In this paper, we propose a skeleton-based dynamic hypergraph framework, namely Skeletal Temporal Dynamic Hypergraph Neural Network (ST-DHGNN) for person Re-ID, which resorts to modeling the high-order correlations among various body parts based on a time series of skeletal information. Specifically, multi-shape and multi-scale patches are heuristically cropped from feature maps, constituting spatial representations in different frames. A joint-centered hypergraph and a bone-centered hypergraph are constructed in parallel from multiple body parts (i.e., head, trunk, and legs) with spatio-temporal multi-granularity in the entire video sequence, in which the graph vertices representing regional features and hyperedges denoting relationships. Dynamic hypergraph propagation containing the re-planning module and the hyperedge elimination module is proposed to better integrate features among vertices. Feature aggregation and attention mechanisms are also adopted to obtain a better video representation for person Re-ID. Experiments show that the proposed method performs significantly better than the state-of-the-art on three video-based person Re-ID datasets, including iLIDS-VID, PRID-2011, and MARS.