Skeleton-Based Attention Mask for Pedestrian Attribute Recognition Network

Sorn Sooksatra; Sitapa Rujikietgumjorn

doi:10.3390/jimaging7120264

Skeleton-Based Attention Mask for Pedestrian Attribute Recognition Network

J Imaging. 2021 Dec 4;7(12):264. doi: 10.3390/jimaging7120264.

Authors

Sorn Sooksatra¹, Sitapa Rujikietgumjorn¹

Affiliation

¹ National Electronic and Computer Technology Center, National Science and Technology Development Agency, Pathum Thani 12120, Thailand.

Abstract

This paper presents an extended model for a pedestrian attribute recognition network utilizing skeleton data as a soft attention model to extract a local feature corresponding to a specific attribute. This technique helped keep valuable information surrounding the target area and handle the variation of human posture. The attention masks were designed to focus on the partial and the whole-body regions. This research utilized an augmented layer for data augmentation inside the network to reduce over-fitting errors. Our network was evaluated in two datasets (RAP and PETA) with various backbone networks (ResNet-50, Inception V3, and Inception-ResNet V2). The experimental result shows that our network improves overall classification performance with a mean accuracy of about 2-3% in the same backbone network, especially local attributes and various human postures.

Keywords: attention network; pedestrian attribute recognition; pose estimation.