作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (10): 264-271. doi: 10.19678/j.issn.1000-3428.0065882

• 开发研究与工程应用 • 上一篇    下一篇

基于自适应多尺度图卷积网络的骨架动作识别

刘宽1, 奚小冰2, 周明东1,*   

  1. 1. 上海交通大学 机械与动力工程学院 上海市复杂薄板结构数字化制造重点实验室, 上海 200240
    2. 上海交通大学医学院附属瑞金医院, 上海 200240
  • 收稿日期:2022-09-29 出版日期:2023-10-15 发布日期:2023-01-03
  • 通讯作者: 周明东
  • 作者简介:

    刘宽(1996—),男,硕士研究生,主研方向为模式识别

    奚小冰,主任医师、硕士

  • 基金资助:
    上海交通大学医工交叉重点项目(YG2019ZDA16)

Skeleton Action Recognition Based on Adaptive Multi-scale Graph Convolutional Network

Kuan LIU1, Xiaobing XI2, Mingdong ZHOU1,*   

  1. 1. Shanghai Key Laboratory of Digital Manufacture for Thin-Walled Structures, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
    2. Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200240, China
  • Received:2022-09-29 Online:2023-10-15 Published:2023-01-03
  • Contact: Mingdong ZHOU

摘要:

将人体骨架建模为时空拓扑图的图卷积网络在基于人体骨架数据的动作识别任务中得到了广泛应用。但现有图卷积网络存在预定义骨架拓扑图拓扑结构固定、单支路时间图卷积算子提取时空特征粒度单一的问题,极大限制了模型的泛化能力和表达能力。提出基于自适应多尺度图卷积网络的人体骨架动作识别模型,自适应空间图卷积层将骨架的拓扑结构作为参数进行端到端的自适应学习,根据动作生成数据驱动的骨架拓扑图。多尺度时间图卷积层对时间图卷积算子进行多支路扩展,动态融合骨架序列不同时间粒度的时空特征。综合骨架关节点、骨架长度、骨架关节点运动、骨架长度运动4路信息输入模型。实验结果表明,所提模型在NTU RGB+D 60动作识别数据集下的人物划分(CS)模式和视角划分模式实验中分别取得90.5%和96.8%的识别准确率,在NTU RGB+D 120动作识别数据集的CS模式和设置划分模式的实验中分别取得86.0%和88.7%的识别准确率,能有效提取骨架动作的时空特征,提升了人体骨架动作识别的分类性能。

关键词: 人体骨架, 动作识别, 图卷积网络, 自适应, 多尺度

Abstract:

The Graph Convolution Network(GCN) is widely used for skeleton-based recognition methods, whereby a predefined spatiotemporal topology graph is used to model human skeletal features. However, the existing GCN has limitations in two aspects: the predefined skeleton topology structure limits the generalization ability of the model, and single-granular spatiotemporal features limit the capacity of the model. To solve the aforementioned problems, an adaptive multi-scale graph convolution network is proposed. The adaptive spatial graph convolution layer regards the topology of the skeleton as parameters to optimize, thereby generating data-driven skeleton topology by samples. The multiscale temporal graph convolution layer uses various temporal graph convolution kernels to dynamically integrate the multi-granular spatiotemporal features. Extensive experiments were conducted by combining joint, bone, joint motion, and bone motion streams as inputs of the proposed model. The experimental results show that the classification accuracy of the proposed model under NTU RGB+D 60 action recognition data set for the Cross-Subject(CS) and Cross-View(CV) subsets was 90.5% and 96.8%, respectively, and the classification accuracy of the NTU RGB+D 120 action recognition dataset for the CS and Cross-Setup(CT)subsets was 86.0% and 88.7%, respectively. The model can effectively extract the spatiotemporal features of skeleton motion, thereby improving the classification performance of human skeleton motion recognition.

Key words: human skeleton, action recognition, Graph Convolution Network(GCN), adaption, multi-scale