作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (2): 278-288. doi: 10.19678/j.issn.1000-3428.0068375

• 图形图像处理 • 上一篇    下一篇

基于轻量级高分辨率网络的人体姿态估计算法

刘圣杰1,2, 何宁1,2,*(), 王鑫2, 于海港1, 韩文静2   

  1. 1. 北京联合大学北京市信息服务工程重点实验室, 北京 100101
    2. 北京联合大学智慧城市学院, 北京 100101
  • 收稿日期:2023-09-12 出版日期:2025-02-15 发布日期:2025-03-25
  • 通讯作者: 何宁
  • 基金资助:
    国家自然科学基金(62272049); 国家自然科学基金(62236006); 北京市教委重点项目(KZ201911417048); 科技创新2030重大项目-“新一代人工智能”(2018AAA0100800); 北京市教委科技项目(KM202111417009); 北京市教委科技项目(KM201811417005)

Human Pose-Estimation Algorithm Based on Lightweight High-Resolution Network

LIU Shengjie1,2, HE Ning1,2,*(), WANG Xin2, YU Haigang1, HAN Wenjing2   

  1. 1. Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
    2. Department of Smart City Academy, Beijing Union University, Beijing 100101, China
  • Received:2023-09-12 Online:2025-02-15 Published:2025-03-25
  • Contact: HE Ning

摘要:

人体姿态估计被广泛应用于运动健身、手势控制、无人超市、娱乐游戏等诸多领域, 但姿态估计任务仍面临着诸多挑战。针对目前主流的人体姿态估计网络参数量大、计算复杂度高的问题, 提出一种基于高分辨率网络的轻量级姿态估计网络(LitePose)。首先, 采用Ghost卷积降低特征提取网络的参数; 其次, 通过采用解耦的全连接(DFC)注意力模块, 更好地捕获远距离空间位置像素间的依赖关系, 减少由于参数量下降而导致的提取特征缺失, 提高人体关键点回归的准确率; 然后, 设计一个特征增强模块, 对骨干网络提取的特征进行进一步增强; 最后, 设计一个新的坐标解码方法, 降低热图解码过程中的误差, 提高关键点回归的准确率。在人体关键点检测数据集COCO和MPII上对LitePose进行验证, 并与当前的主流方法进行对比。实验结果表明, LitePose相比基线网络HRNet精度损失0.2%, 但参数量不及基线网络的1/3, LitePose在保证少量精度损失的同时能够大幅降低网络模型的参数量。

关键词: 人体姿态估计, 高分辨率网络, 轻量化网络, GhostV2, 坐标解码

Abstract:

Human pose estimation is widely used in multiple fields, including sports fitness, gesture control, unmanned supermarkets, and entertainment games. However, pose-estimation tasks face several challenges. Considering the current mainstream human pose-estimation networks with large parameters and complex calculations, LitePose, a lightweight pose-estimation network based on a high-resolution network, is proposed. First, Ghost convolution is used to reduce the parameters of the feature extraction network. Second, by using the Decoupled Fully Connected (DFC) attention module, the dependence relationship between pixels in the far distance space position is better captured and the loss in feature extraction due to decrease in parameters is reduced. The accuracy of human pose keypoint regression is improved, and a feature enhancement module is designed to further enhance the features extracted by the backbone network. Finally, a new coordinate decoding method is designed to reduce the error in the heatmap decoding process and improve the accuracy of keypoint regression. LitePose is validated on the human critical point detection datasets COCO and MPII and compared with current mainstream methods. The experimental results show that LitePose loses 0.2% accuracy compared to the baseline network HRNet; however, the number of parameters is less than one-third of the baseline network. LitePose can significantly reduce the number of parameters in the network model while ensuring minimal accuracy loss.

Key words: human pose estimation, high-resolution network, lightweight network, GhostV2, coordinate decoding