Object Activity Scene Description, Construction, and Recognition

IEEE Trans Cybern. 2021 Oct;51(10):5082-5092. doi: 10.1109/TCYB.2019.2904901. Epub 2021 Oct 12.

Abstract

Action recognition is a critical task for social robots to meaningfully engage with their environment. 3-D human skeleton-based action recognition has been an attractive research area in recent years. Although the existing approaches are good at action recognition, it is a great challenge to recognize a group of actions in an activity scene. To tackle this problem, at first, we partition the scene into several primitive actions (PAs)-based upon motion attention mechanism. Then, the PAs are described by the trajectory vectors of the corresponding joints. After that, motivated by text classification based on word embedding, we employ a convolutional neural network (CNN) to recognize activity scenes by considering motion of joints as "word" of activity. The experimental results on the dataset of human activity scenes show the efficiency of the proposed approach.