Dual-Recommendation Disentanglement Network for View Fuzz in Action Recognition

IEEE Trans Image Process. 2023:32:2719-2733. doi: 10.1109/TIP.2023.3273459. Epub 2023 May 16.

Abstract

Multi-view action recognition aims to identify action categories from given clues. Existing studies ignore the negative influences of fuzzy views between view and action in disentangling, commonly arising the mistaken recognition results. To this end, we regard the observed image as the composition of the view and action components, and give full play to the advantages of multiple views via the adaptive cooperative representation among these two components, forming a Dual-Recommendation Disentanglement Network (DRDN) for multi-view action recognition. Specifically, 1) For the action, we leverage a multi-level Specific Information Recommendation (SIR) to enhance the interaction among intricate activities and views. SIR offers a more comprehensive representation of activities, measuring the trade-off between global and local information. 2) For the view, we utilize a Pyramid Dynamic Recommendation (PDR) to learn a complete and detailed global representation by transferring features from different views. It is explicitly restricted to resist the fuzzy noise influence, focusing on positive knowledge from other views. Our DRDN aims for complete action and view representation, where PDR directly guides action to disentangle with view features and SIR considers mutual exclusivity of view and action clues. Extensive experiments have indicated that the multi-view action recognition method DRDN we proposed achieves state-of-the-art performance over powerful competitors on several standard benchmarks. The code will be available at https://github.com/51cloud/DRDN.