Crowd Counting Network Based on Attention Mechanism and Multiscale Fusion

doi:10.19678/j.issn.1000-3428.0069071

Abstract

Abstract:

To address the challenges of scale variation and background interference in crowd image counting, a novel network model has been proposed. This model aims to fully utilize multiscale information, to mitigate the impact of background noise. Initially, the network model employs ConvNeXt as the backbone for feature extraction. Subsequently, a Multilevel Feature Fusion Module (MFFM) is introduced to effectively integrate features from different layers, which facilitates the cross-scale fusion of features from various layers within the backbone network. The fused features, encompassing semantic information from different scales, are more adept at addressing the issue of scale variation in crowd counting. Furthermore, a MultiScale Attention Module (MSAM) is designed to better tackle the challenges inherent in crowd counting. This module employs branches with different receptive fields to extract features from various scales, leverages Selective Kernel Channel Attention (SKCA) to mitigate the issue of feature similarity in multicolumn structures, and feeds the attention map generated by the module back into the corresponding scale features, to suppress background interference. On the ShanghaiTechA dataset, the proposed model achieves Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) of 56.1 and 93.9, respectively. On the ShanghaiTechB dataset, these metrics are 6.1 and 10.3, respectively. On the UCF_CC_50 dataset, MAE and RMSE are 174.9 and 252.7, respectively. On the Mall dataset, these metrics are 1.42 and 1.85, respectively. Experimental results on public datasets indicate that the proposed model enhances both accuracy and robustness compared to existing representative methods for crowd counting.

Key words: crowd counting, multi-scale feature fusion, attention mechanism, neural networks, density map

摘要：

为了应对人群图像中尺度变化和背景干扰的问题, 提出一种人群计数网络模型, 旨在充分利用多尺度信息并降低背景噪声的影响。首先采用ConvNeXt作为主干网络, 用于提取特征。其次为了有效融合不同层次的特征, 提出多层次特征融合模块(MFFM), 将主干网络中不同层次的特征进行跨尺度融合, 融合后的特征包含了不同尺度的语义信息, 可以更好地适应人群计数任务中的尺度变化问题。接着为了更好地解决人群计数中存在的挑战, 设计一个多尺度注意力模块(MSAM), 根据不同感受野的分支提取不同尺度的特征, 利用选择性Kernel通道注意力(SKCA)缓解多列结构存在的特征相似问题, 并将模块生成的注意力图反馈到对应的尺度特征中, 以抑制背景的干扰。网络模型在ShanghaiTechA数据集中的平均绝对误差(MAE)和均方根误差(RMSE)分别达到了56.1和93.9;在ShanghaiTechB数据集中的MAE和RMSE分别达到了6.1和10.3;在UCF_CC_50数据集中的MAE和RMSE分别达到了174.9和252.7;在Mall数据集中的MAE和RMSE分别达到了1.42和1.85。在公开数据集上的实验结果表明, 提出的网络模型与现有代表性的人群计数方法相比, 在提升人群计数任务的准确性和鲁棒性方面均取得了明显进展。

关键词: 人群计数, 多尺度特征融合, 注意力机制, 神经网络, 密度图

LUAN Fangjun, GONG Qi, YUAN Shuai. Crowd Counting Network Based on Attention Mechanism and Multiscale Fusion[J]. Computer Engineering, 2025, 51(3): 352-361.

栾方军, 龚琪, 袁帅. 基于注意力机制和多尺度融合的人群计数网络[J]. 计算机工程, 2025, 51(3): 352-361.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069071

https://www.ecice06.com/EN/Y2025/V51/I3/352

Figures/Tables 12

Fig.1 CNCount network structure

Fig.2 MFFM structure

Fig.3 SKCA module

Fig.4 Visualization results on ShanghaiTech PartA dataset

Fig.5 Visualization results on ShanghaiTech PartB dataset

Fig.6 Visualization results on UCF_CC_50 dataset

Fig.7 Visualization results on MALL dataset

References 44

1	PATWAL A , DIWAKAR M , TRIPATHI V , et al. Crowd counting analysis using deep learning: a critical review. Procedia Computer Science, 2023, 218, 2448- 2458. doi: 10.1016/j.procs.2023.01.220
2	LEIBE B, SEEMANN E, SCHIELE B. Pedestrian detection in crowded scenes[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE Press, 2005: 322-335.
3	LI M, ZHANG Z X, HUANG K Q, et al. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection[C]//Proceedings of the 19th International Conference on Pattern Recognition. Washington D. C., USA: IEEE Press, 2008: 1-4.
4	IDREES H, SALEEMI I, SHAH M. Multi-source, multi-scale counting in dense crowd images: US9946952[P]. 2018-04-17.
5	CHAN A B, VASCONCELOS N. Bayesian Poisson regression for crowd counting[C]//Proceedings of the 12th IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2009: 545-551.
6	RYAN D, DENMAN S, FOOKES C, et al. Crowd counting using multiple local features[C]//Proceedings of Digital Image Computing: Techniques and Applications. Melbourne, Australia: IEEE Press, 2009: 81-88.
7	卢振坤, 刘胜, 钟乐, 等. 人群计数研究综述. 计算机工程与应用, 2022, 58 (11): 33- 46. doi: 10.3778/j.issn.1002-8331.2111-0281
	LI Z K , LIU S , ZHONG Y , et al. Survey on research of crowd counting. Computer Engineering and Applications, 2022, 58 (11): 33- 46. doi: 10.3778/j.issn.1002-8331.2111-0281
8	DAVIES A C , VELASTIN S A , YIN J H . Crowd monitoring using image processing. Electronics & Communication Engineering Journal, 1995, 7 (1): 37- 47. doi: 10.3969/j.issn.2096-2657.1995.01.008
9	FAN Z Z , ZHANG H , ZHANG Z , et al. A survey of crowd counting and density estimation based on convolutional neural network. Neurocomputing, 2022, 472, 224- 251. doi: 10.1016/j.neucom.2021.02.103
10	KHAN M A , MENOUAR H , HAMILA R . Revisiting crowd counting: state-of-the-art, trends, and future perspectives. Image and Vision Computing, 2023, 129, 104597. doi: 10.1016/j.imavis.2022.104597
11	ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 589-597.
12	THANASUTIVES P, FUKUI K I, NUMAO M, et al. Encoder-decoder based convolutional neural networks with multi-scale-aware modules for crowd counting[C]//Proceedings of the 25th International Conference on Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 2382-2389.
13	LIU Z, MAO H Z, WU C Y, et al. A ConvNet for the 2020s[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE Press, 2022: 11976-11986.
14	HUANG S , LI X , ZHANG Z , et al. Body structure aware deep crowd counting. IEEE Transactions on Image Processing, 2018, 27 (3): 1049- 1059. doi: 10.1109/TIP.2017.2740160
15	DEB D, VENTURA J. An aggregated multicolumn dilated convolution network for perspective-free counting[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 195-204.
16	SHEN Z, XU Y, NI B B, et al. Crowd counting via adversarial cross-scale consistency pursuit[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 5245-5254.
17	ZHANG L, SHI M J, CHEN Q B. Crowd counting via scale-adaptive convolutional neural network[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe, USA: IEEE Press, 2018: 1113-1121.
18	CAO X K , WANG Z P , ZHAO Y Y , et al. Scale aggregation network for accurate and efficient crowd counting. Berlin, Germany: Springer, 2018.
19	XU C F, QIU K, FU J L, et al. Learn to scale: generating multipolar normalized density maps for crowd counting[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 8382-8390.
20	LI Y H, ZHANG X F, CHEN D M. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 1091-1100.
21	YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. [2023-11-20]. https://arxiv.org/abs/1511.07122.
22	MA Y M, SANCHEZ V, GUHA T. Fusioncount: efficient crowd counting via multiscale feature fusion[C]//Proceedings of IEEE International Conference on Image Processing. Washington D. C., USA: IEEE Press, 2022: 468-476.
23	XU C F , LIANG D K , XU Y C , et al. AutoScale: learning to scale for crowd counting. International Journal of Computer Vision, 2022, 130 (2): 405- 434. doi: 10.1007/s11263-021-01542-z
24	祥滨, 吕浩杰. 多尺度注意力机制的双路人群计数网络. 沈阳航空航天大学学报, 2023, 40 (3): 16- 27. doi: 10.3969/j.issn.2095-1248.2023.03.003
	XIANG B , LÜ H J . Two-way crowd counting network with amulti-scale attention mechanism. Journal of Shenyang Aerospace University, 2023, 40 (3): 16- 27. doi: 10.3969/j.issn.2095-1248.2023.03.003
25	SAM D B, SURYA S, BABU R V. Switching convolutional neural network for crowd counting[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE Press, 2017: 5744-5752.
26	SINDAGI V A, PATEL V M. Generating high-quality crowd density maps using contextual pyramid CNNs[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 1861-1870.
27	曾芸芸, 张红英, 袁明东. 多尺度融合的双分支特征提取人群计数算法. 计算机工程与应用, 2024, 60 (20): 224- 232. doi: 10.3778/j.issn.1002-8331.2305-0427
	ZENG Y Y , ZHANG H Y , YUAN M D . Crowd counting algorithm for multi-scale fusion based on dual branch feature extraction. Computer Engineering and Applications, 2024, 60 (20): 224- 232. doi: 10.3778/j.issn.1002-8331.2305-0427
28	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 30-43.
29	SAVNER S S , KANHANGAD V . CrowdFormer: weakly-supervised crowd counting with improved generalizability. Journal of Visual Communication and Image Representation, 2023, 94, 103853. doi: 10.1016/j.jvcir.2023.103853
30	WANG W H , XIE E Z , LI X , et al. PVT v2: improved baselines with pyramid vision transformer. Computational Visual Media, 2022, 8 (3): 415- 424. doi: 10.1007/s41095-022-0274-8
31	WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 568-578.
32	LI B , ZHANG Y , XU H H , et al. CCST: crowd counting with swin transformer. The Visual Computer, 2023, 39 (7): 2671- 2682. doi: 10.1007/s00371-022-02485-3
33	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 10012-10022.
34	WANG F S, LIU K, LONG F, et al. Joint CNN and transformer network via weakly supervised learning for efficient crowd counting[EB/OL]. [2023-11-20]. https://arxiv.org/abs/2203.06388.
35	DAI M L, HUANG Z Z, GAO J Q, et al. Cross-head supervision for crowd counting with noisy annotations[C]//Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2023: 1-5.
36	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-11-20]. https://arxiv.org/abs/2203.06388.
37	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 2117-2125.
38	RONNEBERGER O , FISCHER P , BROX T . U-Net: convolutional networks for biomedical image segmentation. Berlin, Germany: Springer, 2015: 234- 241.
39	DAI F, LIU H, MA Y K, et al. Dense scale network for crowd counting[C]//Proceedings of 2021 International Conference on Multimedia Retrieval. New York, USA: ACM Press, 2021: 64-72.
40	MA Z H, WEI X, HONG X P, et al. Bayesian loss for crowd count estimation with point supervision[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 6142-6151.
41	袁健, 王姗姗, 罗英伟. 基于图像视野划分的公共场所人群计数模型. 计算机应用研究, 2021, 38 (4): 1256-1260, 1280.
	YUAN J , WANG S S , LUO Y W . Public place crowd counting model based on image field division. Application Research of Computers, 2021, 38 (4): 1256-1260, 1280.
42	MA Y M. Inception-based crowd counting - being fast while remaining accurate[EB/OL]. [2023-11-20]. https://arxiv.org/abs/2210.09796.
43	TRAN N H, HUY T D, DUONG S T, et al. Improving local features with relevant spatial information by vision transformer for crowd counting[C]//Proceedings of IEEE/CVF International Conference on Machine Vision. Washington D. C., USA: IEEE Press, 2022: 353-562.
44	沈宁静, 袁健. 基于残差密集连接与注意力融合的人群计数算法. 电子科技, 2022, 35 (6): 6- 12.
	SHEN N J , YUAN J . Crowd counting algorithm based on residual dense connection and attention fusion. Electronic Science and Technology, 2022, 35 (6): 6- 12.

[1]	HU Shulin, ZHANG Huajun, DENG Xiaotao, WANG Zhenghua. Similarity Calculation for Chinese Text Based on Dependency Graph Convolution [J]. Computer Engineering, 2025, 51(3): 76-85.
[2]	LU Peng, ZHONG Chuang. Improved CycleGAN Algorithm for Semi-Supervised Building Extraction [J]. Computer Engineering, 2025, 51(3): 241-251.
[3]	WANG Xinliang, WANG Luying. Safety Helmet Detection Algorithm with Feature Enhancement in Low Light Blasting Scenes [J]. Computer Engineering, 2025, 51(3): 252-260.
[4]	SUN Ting, YANG Jie, LI Jiaxuan, WANG Yaozong. Optimization of YOLOv7 Road Sign Detection Algorithm for Low-Light Traffic Scenes [J]. Computer Engineering, 2025, 51(3): 342-351.
[5]	HUANG Shuyi, TAN Guang. Efficient Video Object Detection Based on Partitioning [J]. Computer Engineering, 2025, 51(2): 65-77.
[6]	ZHANG Huan, WANG Chen, SHAN Jingdong, QIU Runhe. Elevator Safety Risk Prediction Based on Domain Adaptation and Attention Mechanism [J]. Computer Engineering, 2025, 51(2): 86-93.
[7]	ZHANG Yuan, LÜ Defang, MENG Jianjun, QI Wenzhe. Defect Detection of Rail Fasteners Based on Double Attention and GSSN Lightweight [J]. Computer Engineering, 2025, 51(2): 289-299.
[8]	XU Ming, QU Taipeng, JIANG Yanji. Improved YOLOv7 Traffic Sign Detection Algorithm in Complex Scenarios [J]. Computer Engineering, 2025, 51(2): 335-343.
[9]	ZHANG Xingpeng, HE Dong, YANG Mo, YE Hangbin. Nucleus Segmentation Based on Multiscale Attention and Data Augmentation [J]. Computer Engineering, 2025, 51(2): 387-396.
[10]	LUO Xudong, YUAN Di, CHANG Xiaojun, HE Zhenyu. Underwater Target Tracking Based on Uncertainty-Inspired Image Enhancement [J]. Computer Engineering, 2025, 51(1): 11-19.
[11]	ZHOU Xueyang, FU Qiming, CHEN Jianping, CHEN Yanming, LU You, WANG Yunzhe. Document-Level Relation Extraction Method Based on Evidence and Graph Inference: A Case Study of Medical Relationships [J]. Computer Engineering, 2025, 51(1): 106-117.
[12]	XIAO Chaoen, LI Zifan, ZHANG Lei, WANG Jianxin, QIAN Siyuan. Differential Cryptanalysis Based on Transformer Model and Attention Mechanism [J]. Computer Engineering, 2025, 51(1): 156-163.
[13]	HU Yongtao, HUANG Hongqiong. Multi-Branch Clothes-Changing Person Re-Identification with Feature Fusion and Channel Attention [J]. Computer Engineering, 2025, 51(1): 225-234.
[14]	HUO Jiuyuan, SU Hongrui, WU Zeyu, WANG Tingjuan. Road Traffic Small Target Vehicle Detection Algorithm Based on Improved YOLOv8 [J]. Computer Engineering, 2025, 51(1): 246-257.
[15]	ZHENG Yazhou, LIU Wanping, HUANG Dong. A BERT-CNN-GRU Detection Method Based on Attention Mechanism [J]. Computer Engineering, 2025, 51(1): 258-268.

Please choose a citation manager

Content to export