作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (8): 64-74. doi: 10.19678/j.issn.1000-3428.0067977

• 人工智能与模式识别 • 上一篇    下一篇

基于多流语义图卷积网络的人体行为识别

刘锁兰, 王炎, 王洪元*(), 朱生升   

  1. 常州大学计算机与人工智能学院 阿里云大数据学院 软件学院, 江苏 常州 213000
  • 收稿日期:2023-07-03 出版日期:2024-08-15 发布日期:2024-03-19
  • 通讯作者: 王洪元
  • 基金资助:
    国家自然科学基金(61976028); 江苏省社会安全图像与视频理解重点实验室课题(J2021-2); 江苏省研究生科研与实践创新计划项目(KYCX22_3068)

Human Behavior Recognition Based on Multi-Stream Semantic Graph Convolutional Network

Suolan LIU, Yan WANG, Hongyuan WANG*(), Shengsheng ZHU   

  1. School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou University, Changzhou 213000, Jiangsu, China
  • Received:2023-07-03 Online:2024-08-15 Published:2024-03-19
  • Contact: Hongyuan WANG

摘要:

与基于图像的行为识别方法相比, 利用人体骨架信息进行识别能有效克服复杂背景、光照变化以及外貌变化等因素影响。但是, 目前主流的基于人体骨架的行为识别方法存在参数量过大、运算速度慢等问题。对此, 提出一种多流轻量级语义图卷积的行为识别方法。设计多流语义引导的图卷积网络(MS-SGN), 将骨架信息分别表达为骨长流、关节流和细粒度流3种数据流形式, 再对嵌入语义信息的数据流通过自适应图卷积提取空间特征, 并采用不同内核和膨胀率的多尺度时域卷积提取时域特征, 最后对各流分类结果进行加权融合。实验结果表明, 该方法在NTU60 RGB+D数据集上的识别精度分别为90.0%(X-Sub)和95.83%(X-View), 在NTU120 RGB+D数据集上的识别精度分别为83.4%(X-Sub)和84.0%(X-View), 优于SGN、Logsin-RNN等主流方法, 且网络框架更为轻量化。

关键词: 行为识别, 人体骨架, 特征融合, 图卷积网络, 多尺度

Abstract:

Compared to image-based behavior recognition methods, utilizing human skeleton information for recognition effectively overcomes the influence of complex backgrounds, lighting changes, and appearance variations. However, most mainstream human skeleton based methods encounter issues such as large parameter quantity and slow computational speed. To address these issues, this paper proposes a Multi-Stream lightweight Semantic Graph Convolutional Network(MS-SGN) for behavior recognition. The skeleton information is expressed as three data streams: bone length flow, joint flow, and fine-grained flow. Spatial features are extracted from the data stream embedded with semantic information through adaptive graph convolution. Time-domain features are extracted using multi-scale time-domain convolution with different kernels and expansion rates. Finally, the classification results of each stream are weighted and fused. The proposed method achieves an accuracy of 90.0%(X-Sub) and 95.83%(X-View) on large-scale dataset NTU60 RGB+D, and 83.4%(X-Sub) and 84.0%(X-View) on the dataset NTU120 RGB+D, respectively. Comparative experiments demonstrate that the proposed method offers better recognition accuracy than several main methods such as SGN and Logsin-RNN, while maintaining a lightweight network framework.

Key words: behavior recognition, human skeleton, feature fusion, Graph Convolutional Network(GCN), multi-scale