Comparison between Recurrent Networks and Temporal Convolutional Networks Approaches for Skeleton-Based Action Recognition

Mihai Nan; Mihai Trăscău; Adina Magda Florea; Cezar Cătălin Iacob

doi:10.3390/s21062051

Comparison between Recurrent Networks and Temporal Convolutional Networks Approaches for Skeleton-Based Action Recognition

Sensors (Basel). 2021 Mar 15;21(6):2051. doi: 10.3390/s21062051.

Authors

Mihai Nan¹, Mihai Trăscău¹, Adina Magda Florea¹, Cezar Cătălin Iacob¹

Affiliation

¹ Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, RO-060042 Bucharest, Romania.

Abstract

Action recognition plays an important role in various applications such as video monitoring, automatic video indexing, crowd analysis, human-machine interaction, smart homes and personal assistive robotics. In this paper, we propose improvements to some methods for human action recognition from videos that work with data represented in the form of skeleton poses. These methods are based on the most widely used techniques for this problem-Graph Convolutional Networks (GCNs), Temporal Convolutional Networks (TCNs) and Recurrent Neural Networks (RNNs). Initially, the paper explores and compares different ways to extract the most relevant spatial and temporal characteristics for a sequence of frames describing an action. Based on this comparative analysis, we show how a TCN type unit can be extended to work even on the characteristics extracted from the spatial domain. To validate our approach, we test it against a benchmark often used for human action recognition problems and we show that our solution obtains comparable results to the state-of-the-art, but with a significant increase in the inference speed.

Keywords: action recognition; recurrent networks; sequence-to-sequence; temporal convolutional networks.

Abstract

Grants and funding