作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (12): 86-94. doi: 10.19678/j.issn.1000-3428.0063010

• 人工智能与模式识别 • 上一篇    下一篇

引入坐标注意力和自注意力的人体关键点检测研究

刘圣杰1,2, 何宁1,3, 于海港1,2, 王程1,2, 韩文静1,3   

  1. 1. 北京联合大学 北京市信息服务工程重点实验室, 北京 100101;
    2. 北京联合大学 机器人学院, 北京 100101;
    3. 北京联合大学 智慧城市学院, 北京 100101
  • 收稿日期:2021-10-20 修回日期:2021-12-25 发布日期:2022-01-14
  • 作者简介:刘圣杰(1997—),男,硕士研究生,主研方向为计算机视觉、数字图像处理;何宁(通信作者),教授、博士;于海港、王程、韩文静,硕士研究生。
  • 基金资助:
    国家自然科学基金(61872042,62172045);国家重点研发计划(2018AAA0100804);北京市教委科技计划重点项目(KZ201911417048);北京联合大学人才强校优选计划(BPHR2020AZ01,BPHR2020EZ01);北京联合大学科研项目(ZK50202001)。

Research on Human Key Point Detection with Coordinated Attention and Self-Attention

LIU Shengjie1,2, HE Ning1,3, YU Haigang1,2, WANG Cheng1,2, HAN Wenjing1,3   

  1. 1. Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China;
    2. College of Robotics, Beijing Union University, Beijing 100101, China;
    3. Smart City College, Beijing Union University, Beijing 100101, China
  • Received:2021-10-20 Revised:2021-12-25 Published:2022-01-14

摘要: 人体关键点检测在智能视频监控、人机交互等领域具有重要应用。多数基于深度学习的人体关键点检测算法仅聚焦于增加多尺度特征或加深网络模型深度,忽略了在获取低分辨率特征图过程中因重复下采样操作而造成的信息丢失。针对该问题,提出一种高分辨率的人体关键点检测网络CASANet,以实现二维图像人体姿态估计。使用HRNet作为骨干网络,引入坐标注意力模块在1/16分辨率特征图分支上捕获位置信息和通道信息,利用自注意力模块在1/32分辨率特征图分支上捕获位置信息和通道信息的内部相关性,通过这2个模块克服网络在获取低分辨率特征图过程中的信息丢失问题。在MS COCOVAL 2017数据集上进行实验,结果表明, CASANet网络可以在参数量和计算量有少量提升的情况下获得更高的检测准确度,有效提升通道信息和位置信息的提取效果,相较基线方法,CASANet的AP值提高2.4个百分点。

关键词: 人体关键点检测, 坐标注意力, 自注意力, 高分辨率网络, 通道注意力

Abstract: Human key point detection has important applications in intelligent video surveillance, human-computer interaction, and other fields.Most human key point detection algorithms based on depth learning focus only on adding multi-scale features or deepening the network model depth, ignoring the information loss caused by repeated downsampling operations in obtaining low-resolution feature maps.To solve this problem, a high-resolution human key point detection network CASANet is proposed to estimate human postures in two-dimensional images.HRNet is used as the backbone network, a Coordinated Attention(CA) module is introduced to capture the position and channel information on the 1/16 resolution feature map branch, and a self-attention module is used to capture the internal correlation of the position and channel information on the 1/32 resolution feature map branch.Using these two modules, the information lost when obtaining low-resolution feature maps is overcome.Experiments were conducted on the MS COCOVAL 2017 dataset, and the results show that CASANet can achieve a higher detection accuracy with a small increase in parameters and computation and effectively improve the extraction effect of channel and location information.Compared with the baseline method, the Average Precision(AP) value of CASANet increased by 2.4 percentage points.

Key words: human key point detection, Coordinated Attention(CA), self-attention, high-resolution network, channel attention

中图分类号: