Extremely Lightweight Skeleton-Based Action Recognition With ShiftGCN+

IEEE Trans Image Process. 2021:30:7333-7348. doi: 10.1109/TIP.2021.3104182. Epub 2021 Aug 20.

Abstract

In skeleton-based action recognition, graph convolutional networks (GCNs) have achieved remarkable success. However, there are two shortcomings of current GCN-based methods. Firstly, the computation cost is pretty heavy, typically over 15 GFLOPs for one action sample. Some recent works even reach ~100 GFLOPs. Secondly, the receptive fields of both spatial graph and temporal graph are inflexible. Although recent works introduce incremental adaptive modules to enhance the expressiveness of spatial graph, their efficiency is still limited by regular GCN structures. In this paper, we propose a shift graph convolutional network (ShiftGCN) to overcome both shortcomings. ShiftGCN is composed of novel shift graph operations and lightweight point-wise convolutions, where the shift graph operations provide flexible receptive fields for both spatial graph and temporal graph. To further boost the efficiency, we introduce four techniques and build a more lightweight skeleton-based action recognition model named ShiftGCN++. ShiftGCN++ is an extremely computation-efficient model, which is designed for low-power and low-cost devices with very limited computing power. On three datasets for skeleton-based action recognition, ShiftGCN notably exceeds the state-of-the-art methods with over 10× less FLOPs and 4× practical speedup. ShiftGCN++ further boosts the efficiency of ShiftGCN, which achieves comparable performance with 6× less FLOPs and 2× practical speedup.