Skeleton Action Recognition Based on Multi-Granularity Cross Attention

doi:10.19678/j.issn.1000-3428.0252088

Abstract

Abstract: Skeleton based motion recognition method is attracting more and more attention because of its excellent performance. In skeleton action recognition task, coarse-grained feature is an important supplement to fine-grained feature, which can effectively improve the performance of action recognition method. However, the existing multi granularity skeleton action recognition methods have shortcomings, first, the constructed coarse-grained features do not accurately retain the structural information between local adjacent fine-grained joint points; second, they do not make good use of the global correlation between coarse-grained features for feature learning. To solve the above problems, when constructing coarse-grained joint points, the arithmetic mean and classical convolution operations are used to capture the position and structure information of local adjacent fine-grained joint points; The cross attention mechanism is used to capture the global correlation between coarse-grained and fine-grained features, which can better describe the part level movement trend and improve the representation ability and discrimination of coarse-grained features. This method is combined with a variety of skeleton motion recognition models, and experiments are carried out under multiple evaluation standards of NTU RGB+D and NTU RGB+D 120 motion recognition data sets. Experimental results show that the proposed method can extract and fuse skeleton motion features with different granularity, and significantly improve the classification performance of human skeleton motion recognition method.

摘要： 基于骨架的动作识别方法因其卓越的性能，正受到越来越多的关注。在骨架动作识别任务中，粗粒度特征是细粒度特征的重要补充，可有效提升动作识别方法的性能。现有的多粒度骨架动作识别方法存在不足，一是所构造的粗粒度特征没有精确保留局部相邻细粒度关节点之间的结构信息，二是没有很好地利用粗细粒度特征之间的全局依赖关系进行特征学习。针对以上问题，在构造粗粒度关节点时，分别使用算术平均和经典的卷积操作捕捉局部相邻细粒度关节点的位置和结构信息；使用交叉注意力机制捕捉粗细两种粒度特征之间的全局依赖关系，在特征融合的同时，更好地刻画了部位级运动趋势，提高了粗粒度特征表征能力和鉴别性。将所提方法与多种骨架动作识别模型相结合，并在NTU RGB+D和NTU RGB+D 120动作识别数据集的多个评测标准下进行实验。实验结果表明，所提方法能够提取并融合不同粒度的骨架动作特征，显著提升人体骨架动作识别方法的分类性能。

Wang Yuanyuan, Cao Hui, Wang Tingwei. Skeleton Action Recognition Based on Multi-Granularity Cross Attention[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252088.

王园园, 曹慧, 王廷蔚. 基于多粒度交叉注意力的骨架动作识别方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252088.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252088

References

[1] Baisware A, Sayankar B, Hood S. Review on recent advances in human action recognition in video data[C]//2019 9th International Conference on Emerging Trends in Engineering and Technology-Signal and Information Processing (ICETET-SIP-19). IEEE, 2019: 1-5. [2] Tran T T M, Parker C, Tomitsch M. A review of virtual reality studies on autonomous vehicle–pedestrian interaction[J]. IEEE Transactions on Human-Machine Systems, 2021, 51(6): 641-652. [3] 张瑞, 李其申, 储珺. 基于 3D 卷积神经网络的人体动作识别算法[J]. 计算机工程, 2019, 45(1): 259-263. ZHANG Rui,LI Qishen,CHU Jun. Human Action Recognition Algorithm Based on 3D Convolution Neural Network[J]. Computer Engineering,2019, 45(1): 259-263. [4] Caterini A L, Chang D E, Caterini A L, et al. Recurrent neural networks[C]. Deep neural networks in a mathematical framework, 2018: 59-79. [5] Han K, Xiao A, Wu E, et al. Transformer in transformer[J]. Advances in neural information processing systems, 2021, 34: 15908-15919. [6] Han K, Wang Y, Chen H, et al. A survey on vision transformer[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(1): 87-110. [7] Yan S, Xiong Y, Lin D. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the AAAI conference on artificial intelligence. 2018:7444–7452 [8] 贺子泽,战荫伟.基于图卷积的局部特征细化动作识别方法[J].计算机工程,2024,50(11):276-283. He Zize, Zhan Yinwei. Action Recognition Method Based on Local Feature Refinement of Graph Convolution [J]. Computer Engineering, 2024,50(11):276-283. [9] Qiu H, Hou B. Multi-grained clip focus for skeleton-based action recognition[J]. Pattern Recognition, 2024, 148: 110188. [10] Shi L, Zhang Y, Cheng J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019:12026-12035. [11] Zhang J, Ye G, Tu Z, et al. A spatial attentive and temporal dilated (SATD) GCN for skeleton‐based action recognition[J]. CAAI Transactions on Intelligence Technology, 2022, 7(1): 46-55. [12] Xu B, Shu X. Pyramid self-attention polymerization learning for semi-supervised skeleton-based action recognition[J]. arXiv preprint arXiv:2302.02327, 2023. [13] Duan H, Wang J, Chen K, et al. Pyskl: Towards good practices for skeleton action recognition[C]//Proceedings of the 30th ACM International Conference on Multimedia. 2022: 7351-7354. [14] Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with multi-stream adaptive graphconvolutional networks[J]. IEEE Transactions on Image Processing, 2020, 29: 9532-9545. [15] Tu Z, Zhang J, Li H, et al. Joint-bone fusion graph convolutional network for semi-supervised skeleton action recognition[J]. IEEE Transactions on Multimedia, 2022, 25: 1819-1831. [16] Bian C, Feng W, Wang S. Self-supervised representation learning for skeleton-based group activity recognition[C]//Proceedings of the 30th ACM International Conference on Multimedia. 2022: 5990-5998. [17] Zhu A, Ke Q, Gong M, et al. Adaptive local-component-aware graph convolutional network for one-shot skeleton-based action recognition[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023: 6038-6047. [18] Xiang W, Li C, Zhou Y, et al. Generative action description prompts for skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 10276-10285. [19] Shi L, Zhang Y, Cheng J, et al. Decoupled spatial-temporal attention network for skeleton-based action-gesture recognition[C]//Proceedings of the Asian conference on computer vision. 2020: 38-53. [20] Qin X, Cai R, Yu J, et al. An efficient self-attention network for skeleton-based action recognition[J]. Scientific Reports, 2022, 12(1): 4111. [21] Qiu H, Hou B, Ren B, et al. Spatio-temporal segments attention for skeleton-based action recognition[J]. Neurocomputing, 2023, 518: 30-38. [22] Liu X, Zhou S, Wang L, et al. Parallel attention interaction network for few-shot skeleton-based action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 1379-1388. [23] Xiong K, Zheng M, Xu Q, et al. Speal: Skeletal prior embedded attention learning for cross-source point cloud registration[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(6): 6279-6287. [24] Cheng K, Zhang Y, He X, et al. Skeleton-based action recognition with shift graph convolutional network[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 183-192. [25] Shi L, Zhang Y, Cheng J, et al. Skeleton-based action recognition with directed graph neural networks[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 7912-7921. [26] Song Y F, Zhang Z, Shan C, et al. Richly activated graph convolutional network for robust skeleton-based action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(5): 1915-1925. [27] 刘宽,奚小冰,周明东.基于自适应多尺度图卷积网络的骨架动作识别[J].计算机工程,2023,49(10):264-271. Liu, K., Xi, X. B., & Zhou, M. D. Skeleton Action Recognition Based on Adaptive Multi-Scale Graph Convolutional Network[J]. Computer Engineering,2023, 49(10), 264 - 271. [28] Tian H, Zhang Y, Wu H, et al. Multi-scale sampling attention graph convolutional networks for skeleton-based action recognition[J]. Neurocomputing, 2024: 128086. [29] Yunhe W ,Yuxin X ,Shuai L .BCCLR: A Skeleton-Based Action Recognition with Graph Convolutional Network Combining Behavior Dependence and Context Clues[J].Computers, Materials & Continua,2024,78(3):4489-4507.

Please choose a citation manager

Content to export