Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (4): 281-292. doi: 10.19678/j.issn.1000-3428.0069044

• Development Research and Engineering Application • Previous Articles     Next Articles

Skeleton Behavior Recognition Based on Extended Temporal and Spatiotemporal Feature Fusion Graph Convolutional Network

XU Yonggang1, SUN Qixuan1,*(), LI Fanjia1,2, CHENG Jianwei3, DAI Jiajun1   

  1. 1. School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
    2. School of Information Engineering(School of Big Data), Xuzhou University of Technology, Xuzhou 221000, Jiangsu, China
    3. School of Safety Engineering, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China
  • Received:2023-12-18 Online:2025-04-15 Published:2024-05-20
  • Contact: SUN Qixuan

基于扩展时间和时空特征融合图卷积网络的骨架行为识别

徐永刚1, 孙琦烜1,*(), 李凡甲1,2, 程健维3, 戴佳俊1   

  1. 1. 中国矿业大学信息与控制工程学院, 江苏 徐州 221116
    2. 徐州工程学院信息工程学院(大数据学院), 江苏 徐州 221000
    3. 中国矿业大学安全工程学院, 江苏 徐州 221116
  • 通讯作者: 孙琦烜
  • 基金资助:
    新疆维吾尔自治区重点研发任务专项(2022B03003-3); 国家自然科学基金(51874299); 中央高校基本科研业务费专项资金(2020CXNL02); 中国矿业大“工业物联网与应急协同”创新团队资助计划(2020ZY002); 徐州市科技计划项目(KC23317)

Abstract:

In recent years, significant progress has been made in the field of skeleton-based human behavior recognition using Graph Convolutional Networks (GCNs). However, most of the existing GCNs concatenate temporal and spatial convolutions in a straightforward manner, which leads to suboptimal spatiotemporal feature fusion. In addition, existing models face challenges in terms of efficiently extracting temporal features. To address these issues, this paper proposes an Extended Temporal and spatiotemporal Feature Fusion Graph Convolutional Network (ETFF-GCN). This network employs channel aggregation to fuse dynamic spatial topology and temporal features in a two-stage fusion process, followed by the application of attention mechanisms for further enhancement. In addition, multiple convolutional kernels of varying sizes are utilized to construct temporal graph convolutions for capturing multiscale and multigranular temporal characteristics. Furthermore, an effective compression excitation module is used for feature enhancement, which leads to improved feature representation capabilities. Experiments on three large datasets demonstrate that the proposed approach outperforms existing methods.

Key words: human skeleton behavior recognition, Graph Convolution Network(GCN), temporal and spatiotemporal feature fusion, attention mechanism, extended temporal

摘要:

在基于骨架的人体行为识别领域, 图卷积网络(GCN)在近年来取得了很大的进展, 但现有GCN大多将时间卷积和空间卷积简单串联, 导致时空特征融合效果不佳。另外, 现有模型还存在无法高效提取时间特征的问题。为此, 提出扩展时间和时空特征融合图卷积网络(ETFF-GCN)。该网络采用通道聚合的方法对动态空间拓扑和时序特征进行一次融合, 然后运用注意力机制进行二次融合, 进一步增强融合效果。在此基础上, 为了全面提取时序特征, 采用多个不同大小的卷积核构建时域图卷积, 以提取多尺度和多粒度的时间特征, 并引入有效压缩激励模块进行特征增强, 提升特征表达能力。在3个大型数据集上对所提出的方法进行评估, 实验结果表明, 该方法的性能优于现有的方法。

关键词: 人体骨架行为识别, 图卷积网络, 时空特征融合, 注意力机制, 扩展时间