作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 290-297. doi: 10.19678/j.issn.1000-3428.0066951

• 开发研究与工程应用 • 上一篇    下一篇

基于注意力机制的多尺度融合人群计数算法

谢新林1,2,*(), 尹东旭1,2, 张涛源1,2, 谢刚1,2   

  1. 1. 太原科技大学电子信息工程学院, 山西 太原 030024
    2. 太原科技大学先进控制与装备智能化山西省重点实验室, 山西 太原 030024
  • 收稿日期:2023-02-15 出版日期:2024-03-15 发布日期:2023-07-18
  • 通讯作者: 谢新林
  • 基金资助:
    国家自然科学基金(62006169); 山西省重点研发计划(202202010101005); 太原科技大学博士科研启动基金(20192047); 山西省高等学校科技创新项目(2020L0347)

Multiscale Fusion Crowd Counting Algorithm Based on Attention Mechanism

Xinlin XIE1,2,*(), Dongxu YIN1,2, Taoyuan ZHANG1,2, Gang XIE1,2   

  1. 1. School of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, Shanxi, China
    2. Shanxi Key Laboratory of Advanced Control and Equipment Intelligence, Taiyuan University of Science and Technology, Taiyuan 030024, Shanxi, China
  • Received:2023-02-15 Online:2024-03-15 Published:2023-07-18
  • Contact: Xinlin XIE

摘要:

针对人群计数图像人头尺度变化大、背景噪声高等问题,提出一种基于注意力机制的多尺度融合人群计数算法,以充分聚合多尺度信息,并有效区分背景噪声。构建基于残差连接的空洞空间金字塔池化,通过残差结构以及多个不同扩张率的空洞卷积在捕获多尺度头部目标特征的同时融入浅层特征图的空间细节信息,提高特征图质量;构建跨层多尺度特征融合模块,融合浅层和深层分支不同大小的边缘细节信息和上下文语义信息,并设计基于多分支的特征融合模块,融合不同感受野大小的多尺度信息以缓解大规模人头尺度变化的问题;构建基于矩阵相似运算的通道和空间注意力机制模块提取像素级特征权重,加强网络对于背景和人头目标的判别能力,自适应矫正位置信息。实验结果表明,相比11种对比算法的最优值,所提算法在SHA数据集上的平均绝对误差和均方根误差指标降低1.4%、4.2%,在UCF_CC_50数据集上降低4.9%、1.8%,能够精确地预测人群分布状态和估计人群数量,生成高质量的人群密度图。

关键词: 人群计数, 多尺度融合, 注意力机制, 卷积神经网络, 密度图

Abstract:

A multiscale fusion crowd counting algorithm based on attention mechanism is proposed to addresses the issues of large head scale changes and high background noise in crowd counting images, fully aggregating multiscale information to effectively distinguish background noise. Atrous spatial pyramid pooling based on residual connection method is constructed to capture multiscale head target features while incorporating spatial details from shallow feature maps through residual structures and multiple dilated convolutions with different expansion rates, thereby improving the quality of feature maps. A cross-layer multiscale feature fusion module is built to integrate edge details and contextual semantic information of different sizes of shallow and deep branches. In addition, a feature fusion module based on multi-branch is designed to integrate multiscale information of different receptive field sizes, thereby alleviating the problem of large-scale head scale changes. A channel and spatial attention mechanism module is further constructed based on the matrix similarity operation to extract pixel level feature weights, enhance the network's discriminative ability for background and head targets, and adaptively correct position information. The experimental results show that compared to the optimal values of the 11 comparison algorithms, the proposed algorithm reduces the Mean Absolute Error(MAE)and Root Mean Square Error(RMSE)indicators by 1.4% and 4.2% on the SHA dataset, and reduced by 4.9% and 1.8% on the UCF_CC_50 dataset, the proposed algorithm can accurately predict the distribution status, estimate the number of people, and generate high-quality population density maps.

Key words: crowd counting, multiscale fusion, attention mechanism, Convolutional Neural Network(CNN), density map