COMAL: compositional multi-scale feature enhanced learning for crowd counting

Fangbo Zhou; Huailin Zhao; Yani Zhang; Qing Zhang; Lanjun Liang; Yaoyao Li; Zuodong Duan

doi:10.1007/s11042-022-12249-9

COMAL: compositional multi-scale feature enhanced learning for crowd counting

Multimed Tools Appl. 2022;81(15):20541-20560. doi: 10.1007/s11042-022-12249-9. Epub 2022 Mar 11.

Authors

Fangbo Zhou¹, Huailin Zhao¹, Yani Zhang², Qing Zhang², Lanjun Liang¹, Yaoyao Li¹, Zuodong Duan³

Affiliations

¹ School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
² School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai, China.
³ Science and Technology on Electromechanical Dynamic Control Laboratory, School of Mechatronical Engineering, Beijing Institute of Technology, Beijing, China.

Abstract

Accurately modeling the crowd's head scale variations is an effective way to improve the counting accuracy of the crowd counting methods. Most counting networks apply a multi-branch network structure to obtain different scales of head features. Although they have achieved promising results, they do not perform very well on the extreme scale variation scene due to the limited scale representability. Meanwhile, these methods are prone to recognize background objects as foreground crowds in complex scenes due to the limited context and high-level semantic information. We propose a compositional multi-scale feature enhanced learning approach (COMAL) for crowd counting to handle the above limitations. COMAL enhances the multi-scale feature representations from three aspects: (1) The semantic enhanced module (SEM) is developed for embedding the high-level semantic information to the multi-scale features; (2) The diversity enhanced module (DEM) is proposed to enrich the variety of crowd features' different scales; (3) The context enhanced module (CEM) is designed for strengthening the multi-scale features with more context information. Based on the proposed COMAL, we develop a crowd counting network under the encoder-decoder framework and perform extensive experiments on ShanghaiTech, UCF_CC_50, and UCF-QNRF datasets. Qualitative and quantitive results demonstrate the effectiveness of the proposed COMAL.

Keywords: Convolutional neural network; Crowd counting; Crowd density estimation; Multi-scale feature learning.