Self-Supervised 3D Behavior Representation Learning Based on Homotopic Hyperbolic Embedding

Jinghong Chen; Zhihao Jin; Qicong Wang; Hongying Meng

doi:10.1109/TIP.2023.3328230

Self-Supervised 3D Behavior Representation Learning Based on Homotopic Hyperbolic Embedding

IEEE Trans Image Process. 2023:32:6061-6074. doi: 10.1109/TIP.2023.3328230. Epub 2023 Nov 8.

Authors

Jinghong Chen, Zhihao Jin, Qicong Wang, Hongying Meng

PMID: 37917516
DOI: 10.1109/TIP.2023.3328230

Abstract

Behavior sequences are generated by a series of spatio-temporal interactions and have a high-dimensional nonlinear manifold structure. Therefore, it is difficult to learn 3D behavior representations without relying on supervised signals. To this end, self-supervised learning methods can be used to explore the rich information contained in the data itself. Context-context contrastive self-supervised methods construct the manifold embedded in Euclidean space by learning the distance relationship between data, and find the geometric distribution of data. However, traditional Euclidean space is difficult to express context joint features. In order to obtain an effective global representation from the relationship between data under unlabeled conditions, this paper adopts contrastive learning to compare global feature, and proposes a self-supervised learning method based on hyperbolic embedding to mine the nonlinear relationship of behavior trajectories. This method adopts the framework of discarding negative samples, which overcomes the shortcomings of the paradigm based on positive and negative samples that pull similar data away in the feature space. Meanwhile, the output of the network is embedded in a hyperbolic space, and a multi-layer perceptron is added to convert the entire module into a homotopic mapping by using the geometric properties of operations in the hyperbolic space, so as to obtain homotopy invariant knowledge. The proposed method combines the geometric properties of hyperbolic manifolds and the equivariance of homotopy groups to promote better supervised signals for the network, which improves the performance of unsupervised learning.