作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (2): 314-320. doi: 10.19678/j.issn.1000-3428.0060493

• 开发研究与工程应用 • 上一篇    

融入双注意力的高分辨率网络人体姿态估计

罗梦诗1, 徐杨1,2, 叶星鑫1   

  1. 1. 贵州大学 大数据与信息工程学院, 贵阳 550025;
    2. 贵阳铝镁设计研究院有限公司, 贵阳 550009
  • 收稿日期:2021-01-05 修回日期:2021-02-25 发布日期:2021-02-26
  • 作者简介:罗梦诗(1995-),女,硕士研究生,主研方向为计算机视觉、机器学习;徐杨(通信作者),副教授、博士;叶星鑫,硕士研究生。
  • 基金资助:
    贵州大学人才引进项目(2015-12);贵州省科技计划项目(黔科合LH字[2016]7429号)。

Human Pose Estimation Using High Resolution Network with Dual Attention

LUO Mengshi1, XU Yang1,2, YE Xingxin1   

  1. 1. College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China;
    2. Guiyang Aluminum-magnesium Design and Research Institute Co., Ltd., Guiyang 550009, China
  • Received:2021-01-05 Revised:2021-02-25 Published:2021-02-26

摘要: 在人体姿态估计任务中,针对高分辨率网络提取和融合特征图的特征信息时不能有效获取多通道信息和空间特征信息,导致人体姿态估计结果不够精确。在高分辨率网络(HRNet)的基础上,提出一种融入双注意力的高分辨率人体姿态估计网络ENNet。通过引入通道注意力,构造E-ecaneck模块和E-ecablock模块作为基础模块,最大程度地对多通道提取足够多的有用信息,在每一阶段子网的多分辨率融合阶段融入空间注意力机制,提取并融合不同分辨率特征信息,通过上采样的方式输出所有融合低分辨率的高分辨率表征。在公开数据集MS COCO2017上进行验证和测试,结果表明,相比于高分辨率网络,该方法mAP提高3.4%,有效改善网络多分辨率表征的信息融合能力,明显提升基础高分辨率网络HRNet的估计精确度。

关键词: 人体姿态估计, 高分辨率网络, 多分辨率融合, 通道注意力, 空间注意力

Abstract: In the human body posture estimation tasks, high resolution networks often fail to obtain multi-channel information and spatial feature information when extracting and fusing the feature information of feature maps, which reduces the accuracy of body posture estimation.To address the problem, this paper proposes a high resolution network(ENNet) with dual attention for body pose estimation.Channel attention is introduced to construct the E-ecaneck module and E-ecablock module, which are used as basic modules to extract useful information from multiple channels to the greatest extent.Then in the multi-resolution fusion stage of each subnet, the spatial attention mechanism is integrated to extract and fuse the feature information of different resolutions.Finally, all high-resolution representations with low resolution are output by upsampling.The network is validated and tested on the public dataset MS COCO2017.Results show that compared with high-resolution networks, the proposed method increases mAP by 3.4%, and effectively improves the information fusion capability of multi-resolution representation of network.It also significantly improves the estimation accuracy of the basic high-resolution network, HRNet.

Key words: human pose estimation, high resolution network, multi-resolution fusion, channel attention, spatial attention

中图分类号: