[1] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2014:580-587. [2] RAUTARAY S S,AGRAWAL A.Vision based hand gesture recognition for human computer interaction:a survey[J].Artificial Intelligence Review,2015,43(1):1-54. [3] CAMGOZ N C,HADFIELD S,KOLLER O,et al.Subunets:end-to-end hand shape and continuous sign language recognition[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:3075-3084. [4] SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[EB/OL].[2019-11-01].https://arxiv.org/abs/1406.2199. [5] NEVEROVA N,WOLFC,TAYLOR G W,et al.Multi-scale deep learning for gesture detection and localization[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2014:474-490. [6] MOLCHANOV P,YANG X,GUPTA S,et al.Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:4207-4215. [7] HUANG G,LIU Z,VAN DER M L,et al.Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4700-4708. [8] BAI S,KOLTER J Z,KOLTUN V.An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[EB/OL].[2019-11-01]. https://arxiv.org/abs/1803.01271v1. [9] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[EB/OL].[2019-11-01].https://arxiv.org/abs/1706.03762. [10] HU Jie,SHEN Li,SUN Gang.Squeeze-and-excitation networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:7132-7141. [11] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[EB/OL].[2019-11-01].https://blog.csdn.net/u011534057/article/details/51318670. [12] WANG Pichao,LI Wanqing,LIU Song,et al.Large-scale isolated gesture recognition using convolutional neural networks[C]//Proceedings of the 23rd International Conference on Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:7-12. [13] TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2015:4489-4497. [14] MOLCHANOV P,GUPTA S,KIM K,et al.Hand gesture recognition with 3D convolutional neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:1-7. [15] PIGOU L,VAN DEN O A,DIELEMAN S,et al.Beyond temporal pooling:recurrence and temporal convolutions for gesture recognition in video[J].International Journal of Computer Vision,2018,126(2/3/4):430-439. [16] SIMONYAN K,ZIAAERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-11-01].https://arXiv.org/abs/1409.1556. [17] SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:1-9. [18] IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[EB/OL].[2019-11-01].https://arXiv preprint arXiv:1502.03167. [19] SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:2818-2826. [20] SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,inception-resnet and the impact of residual connections on learning[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence.[S.1.]:AAAI Press,2017:342-356. [21] HE Kaiming,ZHANG Xiaoyuy,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778. [22] LIN Chi,WAN Jun,LIANG Yanyan,et al.Large-scale isolated gesture recognition using a refined fused model based on masked Res-C3D network and skeleton LSTM[C]//Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition.Washington D.C.,USA:IEEE Press,2018:231-246. [23] MIAO Q,LI Y,OUYANG W,et al.Multimodal gesture recognition based on the ResC3D network[C]//Proceedings of IEEE International Conference on Computer Vision Workshop.Washington D.C.,USA:IEEE Press,2017:675-689. [24] ESCALANTE H J,VICTOR P L,WAN J,et al.ChaLearn joint contest on multimedia challenges beyond visual analysis:an overview[C]//Proceedings of the 23rd International Conference on Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:453-468. [25] WAN Jun,LIN Chi,XIE Yiliang,et al.Results and analysis of ChaLearn LAP multi-modal isolated and continuous gesture recognition,and real versus fake expressed emotions challenges[C]//Proceedings of IEEE International Conference on Computer Vision Workshop.[S.1.]:IEEE Computer Society,2017:469-478. [26] ZHANG Tong,WANG Rong,DING Jianwei,et al.Face recognition based on densely connected convolutional networks[C]//Proceedings of the 4th IEEE International Conference on Multimedia Big Data.Washington D.C.,USA:IEEE Press,2018:1-6. [27] ZHANG Jian,ZHANG Yonghui,HE Jingxuan.Action recognition algorithm based on DenseNet and depth motion map[J].Information Technology and Network Security,2020,39(1):63-69.(in Chinese)张健,张永辉,何京璇.基于DenseNet和深度运动图的行为识别算法[J].信息技术与网络安全,2020,39(1):63-69. [28] CAO Chuqing,LI Ruifeng,ZHAO Lijun.Hand posture recognition method based on depth image technology[J].Computer Engineering,2012,38(8):16-18,21.(in Chinese)曹雏清,李瑞峰,赵立军.基于深度图像技术的手势识别方法[J].计算机工程,2012,38(8):16-18,21. [29] YI Sheng,LIANG Huagang,RU Feng.Hand gesture recognition based on multi-column deep 3D convolutional neural network[J].Computer Engineering,2017,43(8):243-248.(in Chinese)易生,梁华刚,茹锋. 基于多列深度3D卷积神经网络的手势识别[J].计算机工程,2017,43(8):243-248. [30] ZHANG Liang,ZHU Guangming,SHEN Peiyi,et al.Learning spatiotemporal features using 3DCNN and convolutional lSTM for gesture recognition[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:3120-3128. [31] CHAI Xiujuan,LIU Zhipeng,YIN Fang,et al.Two streams recurrent neural networks for large-scale continuous gesture recognition[C]//Proceedings of the 23rd International Conference on Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:31-36. [32] HOU Jiangxun,WANG Guilin,CHEN Xinghao,et al.Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2018:562-579. [33] DING Chongyang,LIU Kai,LI Guang,et al.Spatio-temporal weighted posture motion features for human skeleton action recognition research[J].Chinese Journal of Computers,2019,43(1):29-40.(in Chinese)丁重阳,刘凯,李光,等.基于时空权重姿态运动特征的人体骨架行为识别研究[J].计算机学报,2019,43(1):29-40. [34] XIE Zhao,ZHOU Yi,WU Kewei,et al.Activity recognition based on spatial-temporal attention LSTM[EB/OL].[2019-11-01].http://kns.cnki.net/kcms/detail/11.1826.TP.20191227.1658.002.html.(in Chinese)谢昭,周义,吴克伟,等.基于时空关注度LSTM的行为识别[EB/OL].[2019-11-01].http://kns.cnki.net/kcms/detail/11.1826.TP.20191227.1658.002.html. [35] OHN-BAR E,TRIVEDI M M.Hand gesture recognition in real time for automotive interfaces:a multimodal vision-based approach and evaluations[J].IEEE Transactions on Intelligent Transportation Systems,2014,15(6):2368-2377. [36] CARREIRA J,ZISSERMAN A.Quo vadis,action recognition? a new model and the kinetics dataset[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:6299-6308. [37] ABAVISANI M,JOZE H R V,PATEL V M.Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:1165-1174. [38] WANG H,ONEATA D,VERBEEK J,et al.A robust and efficient video representation for action recognition[J].International Journal of Computer Vision,2016,119(3):219-238. |