Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (4): 119-128. doi: 10.19678/j.issn.1000-3428.0068936

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Residual Behavior Recognition Model Based on Spatio-Temporal Shuffle Attention Mechanism

JIANG Jieping, WANG Mingwen*()   

  1. School of Mathematics, Southwest Jiaotong University, Chengdu 610000, Sichuan, China
  • Received:2023-12-01 Online:2025-04-15 Published:2024-05-14
  • Contact: WANG Mingwen

基于时空置换注意力机制的残差行为识别模型

蒋杰平, 王明文*()   

  1. 西南交通大学数学学院, 四川 成都 610000
  • 通讯作者: 王明文
  • 基金资助:
    国家自然科学基金(62106206)

Abstract:

This paper presents a residual behavior recognition model based on Spatio-temporal Shuffle Attention(SAT) mechanism, to improve the effectiveness of 3D convolution extraction of spatio-temporal features in deep learning models. The SAT mechanism is a lightweight multidimensional hybrid attention mechanism composed of submoudule that combines channels and temporal attention and spatial attention submodule, which adds the dimension of time combination to obtain time and channel information in channel attention. The spatial attention submodule compresses redundant time information, improves the attention to spatial features, carries out channel scrambling and reorganization on extracted features, improves the data representation ability of the model, and reduces the parameter count. In this model, a Resnext residual network is used to extract spatio-temporal features, the spatio-temporal permutation attention mechanism module is embedded into the residual module, and the attention module is used to independently learn the weight parameters of different feature maps. The extracted features are weighted in the channel, time, and space domains to enhance the network's ability to express human behavior, and Focal Loss, which is an improved cross-entropy function, is used as the loss function to solve the uneven sample distribution problem in datasets. Experimental results show that the model achieves a recognition accuracy of 96.3% and 71.6% on the UCF101 and HMDB51 datasets, respectively, which is a significant improvement over other models.

Key words: deep learning, behavior recognition, Spatio-temporal Shuffle Attention (SAT), residual network, cross-entropy function

摘要:

为提升深度学习模型中三维卷积提取时空特征的有效性, 提出一种基于时空置换注意力(SAT)机制的残差行为识别模型。SAT机制是由通道结合时间和空间注意力子模块组成的轻量化的多维度混合注意力机制, 其在通道注意力中增加了结合时间的维度, 获取时间与通道信息; 在空间注意力中压缩冗余的时间信息, 提升对空间特征的关注度, 对提取的特征进行通道置乱及通道重组, 提升模型对数据的表征能力并减少参数量。该模型使用Resnext残差网络提取时空特征, 在残差模块中嵌入SAT模块, 利用注意力模块自主学习不同特征图的权重参数, 针对性地对提取的特征在通道、时间、空间域加权, 增强网络对人体行为的表达能力, 使用改进的交叉熵函数Focal Loss作为损失函数, 解决数据集中可能存在的样本分布不均衡的问题。实验结果表明, 该模型在UCF101以及HMDB51数据集上的识别准确率分别达到了96.3%以及71.6%, 相较于其他对比模型均有显著提升。

关键词: 深度学习, 行为识别, 时空置换注意力, 残差网络, 交叉熵函数