Region-based Activity Recognition Using Conditional GAN

Xinyu Li; Yanyi Zhang; Jianyu Zhang; Yueyang Chen; Huangcan Li; Ivan Marsic; Randall S Burd

doi:10.1145/3123266.3123365

Region-based Activity Recognition Using Conditional GAN

Proc ACM Int Conf Multimed. 2017 Oct:2017:1059-1067. doi: 10.1145/3123266.3123365.

Authors

Xinyu Li¹, Yanyi Zhang¹, Jianyu Zhang¹, Yueyang Chen¹, Huangcan Li¹, Ivan Marsic¹, Randall S Burd²

Affiliations

¹ Rutgers University, Piscataway, NJ, 08854, USA.
² Children's National Medical Center, Washington, D.C., 20010, USA.

Abstract

We present a method for activity recognition that first estimates the activity performer's location and uses it with input data for activity recognition. Existing approaches directly take video frames or entire video for feature extraction and recognition, and treat the classifier as a black box. Our method first locates the activities in each input video frame by generating an activity mask using a conditional generative adversarial network (cGAN). The generated mask is appended to color channels of input images and fed into a VGG-LSTM network for activity recognition. To test our system, we produced two datasets with manually created masks, one containing Olympic sports activities and the other containing trauma resuscitation activities. Our system makes activity prediction for each video frame and achieves performance comparable to the state-of-the-art systems while simultaneously outlining the location of the activity. We show how the generated masks facilitate the learning of features that are representative of the activity rather than accidental surrounding information.

Keywords: Activity Recognition; Deep Learning; Generative Adversarial Network; Localization.

Grants and funding

R01 LM011834/LM/NLM NIH HHS/United States