作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (11): 276-283. doi: 10.19678/j.issn.1000-3428.0068663

• 图形图像处理 • 上一篇    下一篇

基于图卷积的局部特征细化动作识别方法

贺子泽, 战荫伟*()   

  1. 广东工业大学计算机学院, 广东 广州 510006
  • 收稿日期:2023-10-23 出版日期:2024-11-15 发布日期:2024-04-01
  • 通讯作者: 战荫伟
  • 基金资助:
    国家自然科学基金(62272108); 广东省重点领域研发计划(2019B010150002); 广东省重点领域研发计划(2020B0101130019)

Local Feature Refinement Action Recognition Method Based on Graph Convolution

HE Zize, ZHAN Yinwei*()   

  1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, Guangdong, China
  • Received:2023-10-23 Online:2024-11-15 Published:2024-04-01
  • Contact: ZHAN Yinwei

摘要:

动作识别是计算机视觉领域一个重要研究方向。目前, 主流方法在局部动作特征上的关注度不足。部分动作识别方法为关注局部动作特征, 将预定义的人体骨架拆分成左右手、左右腿等多个部分。但是, 这些部分包含的骨架关键点较少, 使得动作特征较相似, 导致识别准确率降低。此外, 已有基于局部动作特征的动作识别方法未充分考虑全局姿态特征, 模型识别准确率不稳定。为此, 提出基于图卷积的局部特征细化动作识别方法。将预定义人体骨骼拓扑图划分为身体、上下肢, 加强模型关注局部动作特征的能力。同时, 设计局部特征细化器, 采用对比学习策略扩大不同种类动作的局部动作特征差异, 缩小同类动作之间的差异, 解决因划分策略造成动作特征相似的问题。在此基础上, 将上下肢与身体的分类结果相结合, 充分利用全局姿态特征, 提高模型的稳定性。实验结果表明, 该方法在NTU RGB+D 60 2个基准数据集X-Sub、X-View的识别准确率分别为93.0%和98.8%, 在NTU RGB+D 120 2个基准数据集X-Sub、X-Set的识别准确率分别为88.8%和90.1%, 能够有效提高动作识别的准确率。

关键词: 动作识别, 对比学习, 骨骼关键点, 预定义骨骼拓扑, 局部特征细化

Abstract:

Although action recognition is an important research area in computer vision, current mainstream methods lack a sufficient emphasis on local features. Some action recognition approaches focus on local action features by dividing the predefined human skeleton into various parts, such as the left and right hands, left and right legs. However, these parts contain fewer skeleton keypoints, resulting in similar action features and a lower recognition efficiency. Moreover, existing methods based on local action features often neglect global posture characteristics, leading to unstable model recognition accuracy. To address these issues, this study proposes a method for refining local features in action recognition based on graph convolution. The proposed method divides the predefined human skeleton topology into body and upper/lower limbs, enhancing the model's capability to focus on local action features. Simultaneously, a local feature refiner uses contrastive learning strategies to expand the differences in the local action features of different types of actions, reduce the differences between similar actions, and solve the problem of similar action features caused by partitioning strategies. Accordingly, the classification results of the upper and lower limbs are combined with those of the body, fully utilizing the global pose features to improve model stability. Experimental results show that the recognition accuracies achieved by this method on two NTU RGB+D 60 benchmark datasets X-Sub and X-View are 93.0% and 98.8%, respectively. Furthermore, the recognition accuracies of X-Sub and X-Set on the NTU RGB+D 120 benchmark datasets are 88.8% and 90.1%, respectively, representing effective improvements in the accuracy of action recognition.

Key words: action recognition, contrastive learning, skeleton keypoints, predefined skeleton topology, local feature refinement