Spatial-Frequency Attention Network for Crowd Counting

Big Data. 2022 Oct;10(5):453-465. doi: 10.1089/big.2022.0039. Epub 2022 Jun 9.

Abstract

Counting the number of people in crowded scenarios is a crucial task in video surveillance and urban security system. Widely deployed surveillance cameras provide big data for training, a compelling deep learning-based counting network. However, large-scale variations in dense crowds are still not entirely solved. To address this problem, we propose a spatial-frequency attention network (SFANet) for crowd counting in this article. A bottleneck spatial attention module is built to emphasize features in various spatial locations and select a region containing individuals adaptively in the spatial domain. As a complementary, in the frequency domain, a multispectral channel attention module is adopted to obtain a more complete set of frequency components for representing each channel. The two attention modules are combined to focus on the discriminative region and suppress the misleading information by their mutual promotion. Experimental results on five benchmark crowd data sets demonstrate that the SFANet can achieve the state-of-the-art performance in terms of accuracy and robustness.

Keywords: convolutional neural network; crowd counting; density estimation; spatial-frequency attention.

Publication types

  • Research Support, Non-U.S. Gov't