Human Pose Estimation Using High Resolution Network with Dual Attention

doi:10.19678/j.issn.1000-3428.0060493

Abstract

Abstract: In the human body posture estimation tasks, high resolution networks often fail to obtain multi-channel information and spatial feature information when extracting and fusing the feature information of feature maps, which reduces the accuracy of body posture estimation.To address the problem, this paper proposes a high resolution network(ENNet) with dual attention for body pose estimation.Channel attention is introduced to construct the E-ecaneck module and E-ecablock module, which are used as basic modules to extract useful information from multiple channels to the greatest extent.Then in the multi-resolution fusion stage of each subnet, the spatial attention mechanism is integrated to extract and fuse the feature information of different resolutions.Finally, all high-resolution representations with low resolution are output by upsampling.The network is validated and tested on the public dataset MS COCO2017.Results show that compared with high-resolution networks, the proposed method increases mAP by 3.4%, and effectively improves the information fusion capability of multi-resolution representation of network.It also significantly improves the estimation accuracy of the basic high-resolution network, HRNet.

Key words: human pose estimation, high resolution network, multi-resolution fusion, channel attention, spatial attention

摘要： 在人体姿态估计任务中，针对高分辨率网络提取和融合特征图的特征信息时不能有效获取多通道信息和空间特征信息，导致人体姿态估计结果不够精确。在高分辨率网络（HRNet）的基础上，提出一种融入双注意力的高分辨率人体姿态估计网络ENNet。通过引入通道注意力，构造E-ecaneck模块和E-ecablock模块作为基础模块，最大程度地对多通道提取足够多的有用信息，在每一阶段子网的多分辨率融合阶段融入空间注意力机制，提取并融合不同分辨率特征信息，通过上采样的方式输出所有融合低分辨率的高分辨率表征。在公开数据集MS COCO2017上进行验证和测试，结果表明，相比于高分辨率网络，该方法mAP提高3.4%，有效改善网络多分辨率表征的信息融合能力，明显提升基础高分辨率网络HRNet的估计精确度。

关键词: 人体姿态估计, 高分辨率网络, 多分辨率融合, 通道注意力, 空间注意力

CLC Number:

TP391.4

LUO Mengshi, XU Yang, YE Xingxin. Human Pose Estimation Using High Resolution Network with Dual Attention[J]. Computer Engineering, 2022, 48(2): 314-320.

罗梦诗, 徐杨, 叶星鑫. 融入双注意力的高分辨率网络人体姿态估计[J]. 计算机工程, 2022, 48(2): 314-320.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0060493

http://www.ecice06.com/EN/Y2022/V48/I2/314

Figures/Tables 13

References

[1] TIJANA V, ALEX D, LAURA H, et al.Systematic literature review of hand gestures used in human computer interaction interfaces[J].International Journal of Human-Computer Studies, 2019, 129:74-94.
[2] 张聪聪, 何宁, 孙琪翔, 等.基于注意力机制的3D DenseNet人体动作识别方法[J].计算机工程, 2021, 47(11):313-320. ZHANG C C, HE N, SUN Q X, et al.Human motion recognition method based on attention mechanism of 3D DenseNet[J].Computer Engineering, 2021, 47(11):313-320.(in Chinese)
[3] KREISS S, BERTONI L, ALAHI A.PifPaf:composite fields for human pose estimation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:11969-11978.
[4] FANG H S, XIE S, TAI Y W, et al.RMPE:regional multi-person pose estimation[C]//Proceedings of 2017 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2353-2362.
[5] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[6] CHU X, OUYANG W, LI H, et al.Structured feature learning for pose estimation[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:4715-4723.
[7] OUYANG W, MA C, YUILLE AL, et al.Multi-context attention for human pose estimation[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:5669-5678.
[8] TOSHEV A, SZEGEDY C.DeepPose:human pose estimation via deep neural networks[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2014:1653-1660.
[9] CARREIRA J, AGRAAL P, FRAGKIADAKI K, et al.Human pose estimation with iterative error feedback[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:4733-4742.
[10] NEWELL A, YANG K Y, DENG J.Stacked hourglass networks for human pose estimation[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:483-499.
[11] CHENG B, XIAO B, WANG J, et al.HigherHRNet:scale-aware representation learning for bottom-up human pose estimation[EB/OL].[2020-12-01].https://arxiv.org/abs/1908.10357.
[12] CHEN Y, WANG Z, PENG Y, et al.Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7103-7112.
[13] SUN K, XIAO B, LIU D, et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:5686-5696.
[14] WANG Q, WU B, ZHU P, et al.ECA-Net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Seattle, USA:IEEE Press, 2020:11531-11539.
[15] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7132-7141.
[16] WANG X, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7794-7803.
[17] YANG Z, ZHU L, WU Y, et al.Gated channel transformation for visual recognition[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:11794-11803.
[18] YANG Y, RAMANAN D.Articulated pose estimation with flexible mixtures-of-parts[C]//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2011:1385-1392.
[19] XIAO B, WU H P, WEI Y C.Simple baselines for human pose estimation and tracking[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:466-481.
[20] CAO Z, HIDALGO G, SIMON T, et al.OpenPose:realtime multi-person 2D pose estimation using part affinity fields[EB/OL].[2020-12-01].https://arxiv.org/abs/1812.08008.
[21] LI J, WANG C, ZHU H, et al.CrowdPose:efficient crowded scenes pose estimation and a new benchmark[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:10855-10864.

Please choose a citation manager

Content to export