Crowd counting is a highly challenging problem in computer vision and machine learning. Most previous methods have focused on consistent density crowds, i.e., either a sparse or a dense crowd, meaning they performed well in global estimation while neglecting local accuracy. To make crowd counting more useful in the real world, we propose a new perspective, named pan-density crowd counting, which aims to count people in varying density crowds. Specifically, we propose the Pan-Density Network (PaDNet) which is composed of the following critical components. First, the Density-Aware Network (DAN) contains multiple subnetworks pretrained on scenarios with different densities. This module is capable of capturing pan-density information. Second, the Feature Enhancement Layer (FEL) effectively captures the global and local contextual features and generates a weight for each density-specific feature. Third, the Feature Fusion Network (FFN) embeds spatial context and fuses these density-specific features. Further, the metrics Patch MAE (PMAE) and Patch RMSE (PRMSE) are proposed to better evaluate the performance on the global and local estimations. Extensive experiments on four crowd counting benchmark datasets, the ShanghaiTech, the UCF_CC_50, the UCSD, and the UCF-QNRF, indicate that PaDNet achieves state-of-the-art recognition performance and high robustness in pan-density crowd counting.
All Science Journal Classification (ASJC) codes
- Computer Graphics and Computer-Aided Design