引入坐标注意力和自注意力的人体关键点检测研究

doi:10.19678/j.issn.1000-3428.0063010

计算机工程 ›› 2022, Vol. 48 ›› Issue (12): 86-94. doi: 10.19678/j.issn.1000-3428.0063010

引入坐标注意力和自注意力的人体关键点检测研究

刘圣杰^1,2, 何宁^1,3, 于海港^1,2, 王程^1,2, 韩文静^1,3

1. 北京联合大学北京市信息服务工程重点实验室, 北京 100101;
2. 北京联合大学机器人学院, 北京 100101;
3. 北京联合大学智慧城市学院, 北京 100101

收稿日期:2021-10-20 修回日期:2021-12-25 发布日期:2022-01-14
作者简介:刘圣杰（1997—），男，硕士研究生，主研方向为计算机视觉、数字图像处理；何宁（通信作者），教授、博士；于海港、王程、韩文静，硕士研究生。
基金资助:
国家自然科学基金（61872042，62172045）；国家重点研发计划（2018AAA0100804）；北京市教委科技计划重点项目（KZ201911417048）；北京联合大学人才强校优选计划（BPHR2020AZ01，BPHR2020EZ01）；北京联合大学科研项目（ZK50202001）。

Research on Human Key Point Detection with Coordinated Attention and Self-Attention

LIU Shengjie^1,2, HE Ning^1,3, YU Haigang^1,2, WANG Cheng^1,2, HAN Wenjing^1,3

1. Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China;
2. College of Robotics, Beijing Union University, Beijing 100101, China;
3. Smart City College, Beijing Union University, Beijing 100101, China

Received:2021-10-20 Revised:2021-12-25 Published:2022-01-14

摘要/Abstract

摘要： 人体关键点检测在智能视频监控、人机交互等领域具有重要应用。多数基于深度学习的人体关键点检测算法仅聚焦于增加多尺度特征或加深网络模型深度，忽略了在获取低分辨率特征图过程中因重复下采样操作而造成的信息丢失。针对该问题，提出一种高分辨率的人体关键点检测网络CASANet，以实现二维图像人体姿态估计。使用HRNet作为骨干网络，引入坐标注意力模块在1/16分辨率特征图分支上捕获位置信息和通道信息，利用自注意力模块在1/32分辨率特征图分支上捕获位置信息和通道信息的内部相关性，通过这2个模块克服网络在获取低分辨率特征图过程中的信息丢失问题。在MS COCOVAL 2017数据集上进行实验，结果表明， CASANet网络可以在参数量和计算量有少量提升的情况下获得更高的检测准确度，有效提升通道信息和位置信息的提取效果，相较基线方法，CASANet的AP值提高2.4个百分点。

关键词: 人体关键点检测, 坐标注意力, 自注意力, 高分辨率网络, 通道注意力

Abstract: Human key point detection has important applications in intelligent video surveillance, human-computer interaction, and other fields.Most human key point detection algorithms based on depth learning focus only on adding multi-scale features or deepening the network model depth, ignoring the information loss caused by repeated downsampling operations in obtaining low-resolution feature maps.To solve this problem, a high-resolution human key point detection network CASANet is proposed to estimate human postures in two-dimensional images.HRNet is used as the backbone network, a Coordinated Attention(CA) module is introduced to capture the position and channel information on the 1/16 resolution feature map branch, and a self-attention module is used to capture the internal correlation of the position and channel information on the 1/32 resolution feature map branch.Using these two modules, the information lost when obtaining low-resolution feature maps is overcome.Experiments were conducted on the MS COCOVAL 2017 dataset, and the results show that CASANet can achieve a higher detection accuracy with a small increase in parameters and computation and effectively improve the extraction effect of channel and location information.Compared with the baseline method, the Average Precision(AP) value of CASANet increased by 2.4 percentage points.

Key words: human key point detection, Coordinated Attention(CA), self-attention, high-resolution network, channel attention

中图分类号:

TP391.4

刘圣杰, 何宁, 于海港, 王程, 韩文静. 引入坐标注意力和自注意力的人体关键点检测研究[J]. 计算机工程, 2022, 48(12): 86-94.

LIU Shengjie, HE Ning, YU Haigang, WANG Cheng, HAN Wenjing. Research on Human Key Point Detection with Coordinated Attention and Self-Attention[J]. Computer Engineering, 2022, 48(12): 86-94.

https://www.ecice06.com/CN/Y2022/V48/I12/86

图/表 11

20230112182917

20230112182921

20230112182924

20230112182928

20230112182932

20230112182935

20230112182939

20230112182943

20230112182947

20230112182950

20230112182954

参考文献

[1] ZHENG C, WU W H, CHEN C, et al.Deep learning-based human pose estimation:a survey[EB/OL].[2021-09-05].https://arxiv.org/abs/2012.13392.
[2] 冯晓月, 宋杰.二维人体姿态估计研究进展[J].计算机科学, 2020, 47(11):128-136. FENG X Y, SONG J.Research advance on 2D human pose estimation[J].Computer Science, 2020, 47(11):128-136.(in Chinese)
[3] WEI S H, RAMAKRISHNA V, KANADE T, et al.Convolutional pose machines[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:4724-4732.
[4] NEWELL A, YANG K, DENG J.Stacked hourglass networks for human pose estimation[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:483-499.
[5] BULAT A, KOSSAIFI J, TZIMIROPOULOS G, et al.Toward fast and accurate human pose estimation via soft-gated skip connections[C]//Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition.Washington D.C., USA:IEEE Press, 2020:8-15.
[6] FANG H S, XIE S Q, TAI Y W, et al.RMPE:regional multi-person pose estimation[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2017:2353-2362.
[7] CHEN Y L, WANG Z C, PENG Y X, et al.Cascaded pyramid network for multi-person pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7103-7112.
[8] LI W B, WANG Z C, YIN B Y, et al.Rethinking on multi-stage networks for human pose estimation[EB/OL].[2021-09-05].https://arxiv.org/abs/1901.00148.
[9] QI T, BAYRAMLI B, ALI U, et al.Spatial shortcut network for human pose estimation[EB/OL].[2021-09-05].https://arxiv.org/abs/1904.03141.
[10] SUN K, XIAO B, LIU D, et al.Deep high-resolution representation learning for human pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:5686-5696.
[11] CAO Z, HIDALGO G, SIMON T, et al.OpenPose:realtime multi-person 2D pose estimation using part affinity fields[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1):172-186.
[12] MARTINEZ G H, RAAJ Y, IDREES H, et al.Single-network whole-body pose estimation[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:6981-6990.
[13] CHENG B W, XIAO B, WANG J D, et al.HigherHRNet:scale-aware representation learning for bottom-up human pose estimation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:5385-5394.
[14] GENG Z G, SUN K, XIAO B, et al.Bottom-up human pose estimation via disentangled keypoint regression[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:14671-14681.
[15] TSOTSOS J K.Analyzing vision at the complexity level[J].Behavioral and Brain Sciences, 1990, 13(3):423-445.
[16] TSOTSOS J K.A computational perspective on visual attention[M].Cambridge, USA:MIT Press, 2011.
[17] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7132-7141.
[18] WOO S, PARK J, LEE J Y, et al.CBAM:Convolutional Block Attention Module[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:3-19.
[19] HU J, SHEN L, ALBANIE S, et al.Gather-excite:exploiting feature context in convolutional neural networks[EB/OL].[2021-09-05].https://arxiv.org/abs/1810.12348.
[20] LINSLEY D, SHIEBLER D, EBERHARDT S, et al.Learning what and where to attend[EB/OL].[2021-09-05].https://arxiv.org/abs/1805.08819.
[21] BELLO I, ZOPH B, LE Q, et al.Attention augmented convolutional networks[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:3285-3294.
[22] MISRA D, NALAMADA T, ARASANIPALAI A U, et al.Rotate to attend:convolutional triplet attention module[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision.Washington D.C., USA:IEEE Press, 2021:3138-3147.
[23] WANG X L, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7794-7803.
[24] CAO Y, XU J, LIN S, et al.Gcnet:non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of IEEE/CVF International Conference on Computer Vision Workshops.Washington D.C., USA:IEEE Press, 2019:12-36.
[25] CHEN Y P, KALANTIDIS Y, LI J S, et al.A²-Nets:double attention networks[EB/OL].[2021-09-05].https://arxiv.org/abs/1810.11579.
[26] LIU J J, HOU Q B, CHENG M M, et al.Improving convolutional networks with self-calibrated convolutions[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2020:10093-10102.
[27] GAO Z L, XIE J T, WANG Q L, et al.Global second-order pooling convolutional networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:3019-3028.
[28] HUANG Z L, WANG X G, WEI Y C, et al.CCNet:criss-cross attention for semantic segmentation[C]//Proceedings of IEEE Conference on Pattern Analysis and Machine Intelligence.Washington D.C., USA:IEEE Press, 2019:603-612.
[29] XIAO B, WU H, WEI Y.Simple baselines for human pose estimation and tracking[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2018:466-481.
[30] 罗梦诗, 徐杨, 叶星鑫.融入双注意力的高分辨率网络人体姿态估计[J].计算机工程, 2022, 48(2):314-320. LUO M S, XU Y, YE X X.Human pose estimation using high resolution network with dual attention[J].Computer Engineering, 2022, 48(2):314-320.(in Chinese)

选择文件类型/文献管理软件名称

选择包含的内容

引入坐标注意力和自注意力的人体关键点检测研究

Research on Human Key Point Detection with Coordinated Attention and Self-Attention

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	贺姗, 蔺素珍, 王彦博, 李大威. 基于特征融合的多波段图像描述生成方法[J]. 计算机工程, 2024, 50(6): 236-244.
[2]	杨硕, 王一丁. 基于改进薄板样条运动模型的人脸动画算法[J]. 计算机工程, 2024, 50(6): 255-265.
[3]	李振鲁, 黄威, 孙锴. 复杂环境下的轻量化道路目标识别算法研究[J]. 计算机工程, 2024, 50(4): 219-227.
[4]	刘彦红, 杨秋翔, 胡帅. 基于特征差异的多尺度特征融合去雾网络研究[J]. 计算机工程, 2024, 50(4): 247-257.
[5]	冯妍舟, 刘建霞, 王海翼, 冯国昊, 白宇. 基于多级残差信息蒸馏的真实图像去噪方法[J]. 计算机工程, 2024, 50(3): 216-223.
[6]	谢帅康, 熊风光, 朱新杰, 宋宁栋, 李文清, 王廷凤. 基于空间可变形Transformer的三维点云配准方法[J]. 计算机工程, 2024, 50(3): 224-232.
[7]	徐浩宸, 刘满华. 基于多层次自注意力网络的人脸特征点检测[J]. 计算机工程, 2024, 50(2): 239-246.
[8]	王正家, 胡飞飞, 张成娟, 雷卓, 何涛. 引入轻量级Transformer的自适应窗口立体匹配算法[J]. 计算机工程, 2024, 50(2): 256-265.
[9]	郭祥振, 李思潼, 卢锐, 郭森, 崔学荣, 杨钢. 基于多任务联合注意力的结肠息肉分割网络[J]. 计算机工程, 2024, 50(2): 327-336.
[10]	杨瑞君, 秦晋京, 程燕. 基于生成对抗网络的自然场景低照度增强模型[J]. 计算机工程, 2024, 50(1): 279-288.
[11]	曹广硕, 黄瑞章, 陈艳平, 秦永彬. 基于多模态学习的乳腺癌生存预测研究[J]. 计算机工程, 2024, 50(1): 296-305.
[12]	李现国, 李滨. 基于Transformer和多尺度CNN的图像去模糊[J]. 计算机工程, 2023, 49(9): 226-233, 245.
[13]	包善书, 车波, 邓林红. 基于双源域迁移学习的肺音信号识别[J]. 计算机工程, 2023, 49(9): 295-302, 312.
[14]	卢昂, 储珺, 冷璐. 基于高低频特征增强的图像去雾[J]. 计算机工程, 2023, 49(8): 174-181.
[15]	杨祖赫, 黎智辉, 唐云祁, 晏于文, 宋华青. 结合语义与图像信息的行人属性识别算法[J]. 计算机工程, 2023, 49(8): 215-222, 231.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

引入坐标注意力和自注意力的人体关键点检测研究

Research on Human Key Point Detection with Coordinated Attention and Self-Attention

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价