作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (1): 216-224. doi: 10.19678/j.issn.1000-3428.0068398

• 图形图像处理 • 上一篇    下一篇

基于运动-时间感知的人体动作识别方法

王晓路*(), 汶建荣   

  1. 西安科技大学通信与信息工程学院, 陕西 西安 710054
  • 收稿日期:2023-09-17 出版日期:2025-01-15 发布日期:2024-04-11
  • 通讯作者: 王晓路
  • 基金资助:
    西安市科技计划项目(2020KJRC0070)

Human Action Recognition Method Based on Action-Time Perception

WANG Xiaolu*(), WEN Jianrong   

  1. College of Communication and Information Engineering, Xi'an University of Science and Technology, Xi'an 710054, Shaanxi, China
  • Received:2023-09-17 Online:2025-01-15 Published:2024-04-11
  • Contact: WANG Xiaolu

摘要:

针对动作视频中存在冗余信息及动作信息的特征通道分布稀疏问题, 提出一种基于运动-时间感知的3D残差网络。利用运动感知模块(AM)计算特征级别的时间差来激励运动敏感通道, 以此获取运动特征; 通过时间注意力模块(TM)沿着时间维度计算注意力权重矩阵, 以获取局部时间特征。将AM模块和TM模块的计算结果相加, 得到动作信息的融合特征, 再加入到3D残差网络中, 以此构造基于运动-时间感知模块(ATM)的3D残差网络。实验结果表明, 在公共数据集UCF101和HMDB51上, 相对于3DResNeXt-101网络, 基于ATM模块的3DResNeXt-101网络的动作识别准确率分别提升1.6%和2.8%, 说明所提方法具有可行性和有效性。

关键词: 深度学习, 动作识别, 运动感知, 时间注意力, 3D残差网络

Abstract:

To address the problem of redundant information in action videos and the sparse distribution of feature channels in action information, a 3D residual network based on action-time perception is proposed. The Action-perception Module (AM) calculates temporal differences at the feature level. The motion features can be obtained by utilizing these differences to excite the action-sensitive channel. The Temporal attention Module (TM) focuses on the attention weight matrix along the time dimension to determine the local time features. The fusion features of action information can be obtained by combining the results of the AM and TM. The fusion feature is then incorporated into the 3D convolution network to construct an Action-Time perception Module (ATM)-based Three-Dimensional Convolutional Neural Network (3DCNN) action recognition network. The experimental results show that on the public datasets UCF101 and HMDB51, the action recognition accuracy of the 3DResNeXt-101 network based on the ATM module is improved by 1.6% and 2.8%, respectively, compared with that of the 3DResNeXt-101 network, indicating the feasibility and effectiveness of the proposed method.

Key words: deep learning, action recognition, action-perception, temporal attention, 3D residual network