Dynamic Gesture Recognition Model Based on 3D Convolutional Neural Network

doi:10.19678/j.issn.1000-3428.0059314

Abstract

Abstract: As an important way of human-computer interaction,gesture interaction and gesture recognition have become current research hotspots in virtual reality,remote control and other fields due to their flexibility and convenience.Aiming at the problem that the accuracy of gesture recognition on the gesture video without the logo frame is affected,a dynamic gesture recognition method with a hierarchical network structure is proposed.The method uses the gesture detection model as the first level network,and gesture classification The model is a second-level network,which completes the identification task step by step.At the same time,in order to avoid the completion of the task in stages and the large number of parameters in the 3D convolutional neural network,resulting in excessive model training or running time consumption,a method of splitting the 3D convolution kernel into time domain convolution and spatial domain convolution is proposed.Method to reduce the time consumption of the model.The experimental results show that under the premise of ensuring real-time performance,the recognition accuracy rate on the experimental data set EgoGesture reaches 93.35%,which is better than C3D,ResNeXt101,MTUT and other methods,which proves the effectiveness of the method proposed in the article.

Key words: dynamic gesture recognition, hierarchical structure, convolution kernel split, 3D Convolutional Neural Network(CNN), gesture detector

摘要： 在不带有标志帧的手势视频上进行动态手势识别，容易导致识别准确率下降。提出一种具有分级网络结构的动态手势识别模型。以手势检测模型为第1级网络，手势分类模型为第2级网络，分步完成识别任务。同时，将三维卷积核拆分为时间域和空间域卷积分阶段完成任务，解决三维卷积神经网络中因参数过多造成模型训练或运行时间过长的问题。实验结果表明，在保证实时性的前提下，该模型在EgoGesture数据集上的识别准确率高达93.35%，优于C3D、ResNeXt101、MTUT等模型。

关键词: 动态手势识别, 分级结构, 卷积核拆分, 3D卷积神经网络, 手势检测器

CLC Number:

TP391.4

XU Fang, HUANG Jun, CHEN Quan. Dynamic Gesture Recognition Model Based on 3D Convolutional Neural Network[J]. Computer Engineering, 2021, 47(11): 283-291.

徐访, 黄俊, 陈权. 基于3D卷积神经网络的动态手势识别模型[J]. 计算机工程, 2021, 47(11): 283-291.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0059314

https://www.ecice06.com/EN/Y2021/V47/I11/283

Figures/Tables 15

References

[1] 宋一凡,张鹏,刘立波.基于视觉手势识别的人机交互系统[J].计算机科学,2019,46(S2):570-574. SONG Y F,ZHANG P,LIU L B.Human-computer interaction system based on visual gesture recognition[J].Computer Science,2019,46(S2):570-574.(in Chinese)
[2] 沙洁,麻建,牟海军,等.基于视觉的动态手势识别综述[J].计算机科学与应用,2020,10(5):990-1001. SHA J,MA J,MOU H J,et al.Overview of dynamic gesture recognition based on vision[J].Computer Science and Applications,2020,10(5):990-1001.(in Chinese)
[3] GRIMES G J.Digital data entry glove interface device[EB/OL].[2020-07-12].https://www.freepatentsonline.com/4414537.html.
[4] LIU N,LOVELL B C,KOOTSOOKOS P J,et al.Model structure selection & training algorithms for an HMM gesture recognition system[C]//Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition.Washington D.C.,USA:IEEE Press,2004:100-105.
[5] HARTMANN B,LINK N.Gesture recognition with inertial sensors and optimized DTW prototypes[C]//Proceedings of IEEE International Conference on Systems,Man and Cybernetics.Washington D.C.,USA:IEEE Press,2010:2102-2109.
[6] 缪永伟,李佳颖,孙树森.融合手势全局运动和手指局部运动的动态手势识别[J].计算机辅助设计与图形学学报,2020,32(9):1492-1501. MIAO Y W,LI J Y,SUN S S.Dynamic gesture recognition combining global gesture movement and finger local movement[J].Journal of Computer Aided Design and Graphics,2020,32(9):1492-1501.(in Chinese)
[7] 王远明,张珺,秦远辉,等.基于多特征融合的指挥手势识别方法研究[J].系统仿真学报,2019,31(2):346-352. WANG Y M,ZHANG J,QIN Y H,et al.Research on command gesture recognition method based on multi-feature fusion[J].Journal of System Simulation,2019,31(2):346-352.(in Chinese)
[8] JI S,XU W,YANG M,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231.
[9] TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2015:4489-4497.
[10] 李杰.基于深度卷积神经网络的动态手势识别[D].济南:山东大学,2019. LI J.Dynamic gesture recognition based on deep convolutional neural network[D].Jinan:Shandong University,2019.(in Chinese)
[11] LIN C,WAN J,LIANG Y Y,et al.Large-scale isolated gesture recognition using a refined fused model based on masked Res-C3D network and skeleton LSTM[C]//Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition.Washington D.C.,USA:IEEE Press,2018:231-246.
[12] MIAO Q,LI Y,OUYANG W,et al.Multimodal gesture recognition based on the ResC3D network[C]//Proceedings of IEEE International Conference on Computer Vision Workshop.Washington D.C.,USA:IEEE Press,2017:675-689.
[13] 张毅,赵杰煜,王翀,等.时域注意力Dense-TCNs在多模手势识别中的应用[J].计算机工程,2020,46(9):101-109. ZHANG Y,ZHAO J Y,WANG C,et al.The application of time-domain attention Dense-TCNs in multi-modal gesture recognition[J].Computer Engineering,2020,46(9):101-109.(in Chinese)
[14] HUANG G,LIU Z,VANDERM,et al.Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4700-4708.
[15] 时梦丽,张备伟,刘光徽.基于深度图像的实时手势识别方法[J].计算机工程与设计,2020,41(7):2057-2062. SHI M L,ZHANG B W,LIU G H.Real-time gesture recognition method based on depth image[J].Computer Engineering and Design,2020,41(7):2057-2062.(in Chinese)
[16] DONAHUE J,HENDRICKS L A,GUADARRAMA S,et al.Long-term recurrent convolutional networks for visual recognition and description[C]//Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence.Washington D.C.,USA:IEEE Press,2015:2625-2634.
[17] MOLCHANOV P,YANG X,GUPTA S,et al.Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:4207-4215.
[18] TANG H,LIU H,WEI X,et al.Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion[J].Neurocomputing,2019:331(28):424-433.
[19] SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2014:568-576.
[20] HE K M,ZHANG X Y,REN S Q,et al.Identity mappings in deep residual networks[EB/OL].[2020-07-12].https://arxiv.org/abs/1603.05027.
[21] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[22] SAEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:2818-2826.
[23] MATERZYNAKA J,BERGER G,BAX I,et al.The jester dataset:a large-scale video dataset of human gestures[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop.Washington D.C.,USA:IEEE Press,2019:2874-2882.
[24] ZHANG Y F,CAO C Q,CHENG J A,et al.EgoGesture:a new dataset and benchmark for egocentric hand gesture recognition[J].IEEE Transactions on Multimedia,2018,20(5):1038-1050.
[25] 王苏振.基于深度学习的手势识别技术研究[D].杭州:浙江大学,2019. WANG S Z.Research on gesture recognition technology based on deep learning[D].Hangzhou:Zhejiang University,2019.(in Chinese)
[26] CAO C Q,ZHANG Y F,YI W,et al.Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:3783-3791.
[27] KOPUKLU O,GUNDUZ A,KOSE N,et al.Real-time hand gesture detection and classification using convolutional neural networks[C]//Proceedings of IEEE International Conference on Automatic Face Gesture Recognition.Washington D.C.,USA:IEEE Press,2019:1-8.
[28] ABAVISANI M,JOZE H R,PATEL V.Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:1165-1174.

Please choose a citation manager

Content to export