Skeleton-Based Attention Mask for Pedestrian Attribute Recognition Network

J Imaging. 2021 Dec 4;7(12):264. doi: 10.3390/jimaging7120264.

Abstract

This paper presents an extended model for a pedestrian attribute recognition network utilizing skeleton data as a soft attention model to extract a local feature corresponding to a specific attribute. This technique helped keep valuable information surrounding the target area and handle the variation of human posture. The attention masks were designed to focus on the partial and the whole-body regions. This research utilized an augmented layer for data augmentation inside the network to reduce over-fitting errors. Our network was evaluated in two datasets (RAP and PETA) with various backbone networks (ResNet-50, Inception V3, and Inception-ResNet V2). The experimental result shows that our network improves overall classification performance with a mean accuracy of about 2-3% in the same backbone network, especially local attributes and various human postures.

Keywords: attention network; pedestrian attribute recognition; pose estimation.