作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (5): 235-241,250. doi: 10.19678/j.issn.1000-3428.0063039

• 图形图像处理 • 上一篇    下一篇

融合注意力机制与上下文密度图的人群计数网络

吴奇元, 王晓东, 章联军, 高海玲, 赵伸豪   

  1. 宁波大学 信息科学与工程学院, 浙江 宁波 315211
  • 收稿日期:2021-10-25 修回日期:2021-12-23 发布日期:2021-12-28
  • 作者简介:吴奇元(1996—),男,硕士研究生,主研方向为深度学习;王晓东,教授;章联军(通信作者),实验师;高海玲、赵伸豪,硕士研究生。
  • 基金资助:
    国家自然科学基金“超高清自由视点视频感知质量模型与绘制研究”(61771269);宁波市自然科学基金“面向自由视点视频系统的立体视频质量评价研究”(2019A610107)。

Crowd Counting Network with Attention Mechanism and Contextual Density Map

WU Qiyuan, WANG Xiaodong, ZHANG Lianjun, GAO Hailing, ZHAO Shenhao   

  1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, Zhejiang 315211, China
  • Received:2021-10-25 Revised:2021-12-23 Published:2021-12-28

摘要: 为分析商业区人群流动情况,或避免人群踩踏等公共事件的发生,通常采用人群计数方法统计监控图像中的人数信息,从而达到提前预警的效果。受目标遮挡、背景干扰、多尺度变化等因素的影响,现有的人群计数方法在统计人数信息的过程中存在误算或漏算的问题,导致准确率降低。提出一种基于注意力机制与上下文密度图融合的人群计数网络CADMFNet。以VGG16的部分卷积层作为前端网络,通过引入上采样融合模块对输入的特征图进行上下文特征融合,将不同膨胀率的膨胀卷积作为后端网络,生成高质量的中间密度图。在此基础上,采用上下文注意力模块融合不同层级的中间密度图,获得精细的人群密度图。实验结果表明,该网络在Mall数据集上的平均绝对误差和均方根误差分别为1.31和1.59,相比CSRNet、MCNN等网络,能够有效提高计数的准确度,并且具有较优的鲁棒性。

关键词: 人群计数, 特征融合, 膨胀卷积, 注意力机制, 卷积神经网络

Abstract: To analyze the flow of people in a business district and avoid crowd stampedes, the crowd counting method is usually used to count the number of people in a monitoring image to achieve the effect of early warning.Affected by target occlusion, background interference, multi-scale change, and other factors, existing crowd counting methods have miscalculation and omission in the process of counting the number of people, resulting in low accuracy.A crowd counting network, CADMFNet, based on attention mechanism and contextual density map fusion is proposed.Taking part of the convolution layer of VGG16 as the front-end network, the contextual features of the input feature map are fused by introducing the up-sampling fusion module, and the dilated convolution with different dilation rates are used as the back-end network to generate a high-quality intermediate density map.On this basis, the contextual attention module is used to fuse the intermediate density maps of different levels to obtain a fine population density map.The experimental results show that the average absolute error and root mean square error of the network on the Mall data set are 1.31 and 1.59, respectively.Compared with CSRNet, MCNN, and other networks, it effectively improves the counting accuracy and exhibited better robustness.

Key words: crowd counting, feature fusion, dilated convolution, attention mechanism, Convolutional Neural Network (CNN)

中图分类号: