作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (9): 314-320. doi: 10.19678/j.issn.1000-3428.0062606

• 开发研究与工程应用 • 上一篇    

基于背景抑制与上下文感知的人群计数网络

黄奕秋1, 胡晓2, 杨佳信1, 欧嘉敏1   

  1. 1. 广州大学 电子与通信工程学院, 广州 510006;
    2. 广州大学 机械与电气工程学院, 广州 510006
  • 收稿日期:2021-09-06 修回日期:2021-10-28 发布日期:2021-11-02
  • 作者简介:黄奕秋(1997—),男,硕士研究生,主研方向为计算机视觉、人群计数;胡晓,教授,博士;杨佳信、欧嘉敏,硕士研究生。
  • 基金资助:
    国家自然科学基金(62076075)。

Crowd Counting Network Based on Background Suppression and Context Awareness

HUANG Yiqiu1, HU Xiao2, YANG Jiaxin1, OU Jiamin1   

  1. 1. School of Electronics and Communication Engineering, Guangzhou University, Guangzhou 510006, China;
    2. School of Mechanical and Electrical Engineering, Guangzhou University, Guangzhou 510006, China
  • Received:2021-09-06 Revised:2021-10-28 Published:2021-11-02

摘要: 针对图像背景噪声、透视畸变等影响人群计数网络计数精度的问题,提出一种基于背景抑制与上下文感知的新网络。利用VGG-16网络提取图像特征,并分别将特征输入密度图生成模块和背景噪声抑制(BNS)模块中进行处理,生成密度特征图和空间注意力图。使用BNS模块优化密度特征图并生成初级密度图,以抑制图像中背景噪声干扰,提高人群区域的特征权重。为减少透视畸变对人群密度估计的影响,使用上下文感知增强网络优化初级密度图,并生成预测密度图。在ShanghaiTech、UCF-CC-50及UCF-QNRF 3个公开数据集上的实验结果表明,该网络相较于MCNN、SwitchCNN、CSRNet等网络的计算准确度较高,尤其在UCF-QNRF数据集上其平均绝对误差和均方误差分别为85.8、146.0,相较于其他网络最高分别下降69.0%和67.2%,能充分抑制图像背景噪声并有效减小透视畸变引起的误差,具有良好的泛化能力和较强的鲁棒性。

关键词: 人群计数, 深度学习, 密度图, 背景噪声, 上下文感知

Abstract: To reduce the influence of background noise and perspective distortion in crowd counting tasks, a new network based on background suppression and context awareness is proposed.VGG-16 network is used to extract image features, which are input into Density Map Generation (DMG) and Background Noise Suppression(BNS) modules for processing to generate density feature and spatial attention maps.The BNS module is used to optimize a density feature map and generate a primary density map, to suppress noise information interference in the image and improve the characteristic weight of the crowd area.To reduce the influence of perspective distortion on counting density estimation, a Weight Enhancement-Context Aware Network (WE-CAN) is used to optimize the primary density map and generate the predicted density map.Experiment results on three public datasets, namely ShanghaiTech, UCF-CC-50 and UCF-QNRF show that the network has higher computational accuracy than Multi-Column Convolutional Neural Network (MCNN), Switching Convolutional Neural Network (SwitchCNN), Congested Scene Recognition Network(CSRNet) and other networks.Especially on UCF-QNRF, the Mean Absolute Error(MAE) of the proposed algorithm reach 85.8, and the Mean Square Error (MSE) reach 146.0.Compared with other algorithms, the highest decrease is 69.0% and 67.2%, respectively.The network proposed can also suppress background noise, reduce the error caused by perspective distortion, and has good accuracy and robustness.

Key words: crowd counting, deep learning, density map, background noise, context awareness

中图分类号: