基于运动区域差分与卷积神经网络的动作识别

doi:10.19678/j.issn.1000-3428.0053623

计算机工程 ›› 2019, Vol. 45 ›› Issue (12): 274-280,293. doi: 10.19678/j.issn.1000-3428.0053623

基于运动区域差分与卷积神经网络的动作识别

陈晓春¹, 林博溢^2,3, 孙乾², 张坤华³

1. 深圳清华大学研究院电子设计自动化实验室, 广东深圳 518057;
2. 鹏城实验室, 广东深圳 518082;
3. 深圳大学电子与信息工程学院, 广东深圳 518060

收稿日期:2019-01-09 修回日期:2019-03-01 发布日期:2019-03-15
作者简介:陈晓春(1972-),男,博士,主研方向为机器学习、多媒体信息处理;林博溢、孙乾,硕士研究生;张坤华,副教授、博士。
基金资助:
广东省科技计划项目（2016B010126003）；深圳市基础研究项目（JCYJ20170816151958999）。

Action Recognition Based on Motion Region Difference and Convolutional Neural Network

CHEN Xiaochun¹, LIN Boyi^2,3, SUN Qian², ZHANG Kunhua³

1. Key Laboratory of Electronic Design Automation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, Guangdong 518057, China;
2. Peng Cheng Laboratory, Shenzhen, Guangdong 518082, China;
3. College of Electronics and Information Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China

Received:2019-01-09 Revised:2019-03-01 Published:2019-03-15

摘要/Abstract

摘要： 针对视频动作识别中数据处理效率不高的问题，建立一种基于视频帧间差分序列的动作识别模型。利用帧间差分检测视频帧中的运动区域，以该区域为中心进行相应的图像剪切和增强处理。整个识别模型采用双流架构，在数据样本制作时通过适当的隔帧差分来扩大样本的时间跨度。采用分阶段逐步增加训练样本量的方法，以提升模型识别性能并解决训练过程中易出现的过拟合问题。实验结果表明，该模型可以在CPU级配置的电脑中完成快速动作识别，且在UCF11和UCF25数据集中的识别准确率均高于85%。

关键词: 帧间差分, 动作识别, 双流架构, 卷积神经网络, 运动区域

Abstract: Aiming at the low efficiency of data processing in video action recognition,this paper proposes an action recognition model based on the difference sequences between video frames.First,this paper uses inter frame difference to detect the motion region in the video frame,and this region is taken as the center where corresponding image clipping and enhancement are carried out.Then,the dual-stream architecture is applied to the recognition model and the time span of the samples is extended by the appropriate frame difference when data samples are made.Finally,the number of training samples is gradually increased,so as to improve the performance of model recognition and tackle the over fitting problem in the training process.Experimental results show that the proposed model can complete fast action recognition in CPU level computers,and its recognition accuracy in UCF11 and UCF25 datasets is higher than 85%.

Key words: inter frame difference, action recognition, dual-stream architecture, Convolutional Neural Network(CNN), motion region

中图分类号:

TP391

陈晓春, 林博溢, 孙乾, 张坤华. 基于运动区域差分与卷积神经网络的动作识别[J]. 计算机工程, 2019, 45(12): 274-280,293.

CHEN Xiaochun, LIN Boyi, SUN Qian, ZHANG Kunhua. Action Recognition Based on Motion Region Difference and Convolutional Neural Network[J]. Computer Engineering, 2019, 45(12): 274-280,293.

https://www.ecice06.com/CN/Y2019/V45/I12/274

图/表 11

20191214134522

20191214134525

20191214134527

20191214134530

20191214134532

20191214134536

20191214134539

20191214134542

20191214134545

20191214134548

20191214134552

参考文献

[1] WANG Heng,ALEXANDER K,SCHMID C,et al.Dense trajectories and motion boundary descriptors for action recognition[J].International Journal of Computer Vision,2013,103(1):60-79.
[2] PENG Xiaojiang,WANG Limin,WANG Xingxing,et al.Bag of visual words and fusion methods for action recognition:comprehensive study and good practice[J].Computer Vision and Image Understanding,2016,150:109-125.
[3] WANG Heng,SCHMID C.Action recognition with improved trajectories[C]//Proceedings of IEEE International Confe-rence on Computer Vision.Washington D.C.,USA:IEEE Press,2013:3551-3558.
[4] SIMONYAN K,ZISSERMAN A.Two-stream convolutional networks for action recognition in videos[J].Neural Information Processing Systems,2014,1(4):568-576.
[5] DONAHUE J,HENDRICKS L A,ROHRBACH M,et al.Long-term recurrent convolutional networks for visual recognition and description[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2014,39(4):677-691.
[6] DU T,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings ofIEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Computer Society,2015:4489-4497.
[7] QIU Zhaofan,YAO Ting,MEI Tao.Learning spatio-temporal representation with pseudo-3D residual networks[C]//Proceedings of 2017 IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:5534-5542.
[8] KENSHO H,HIROKATSU K,YUTAKA S.Towards good practice for action recognition with spatiotemporal 3D convolutions[C]//Proceedings of the 24th International Conference on Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:2516-2521.
[9] WANG Limin,XIONG Yuanjun,WANG Zhe,et al.Temporal segment networks:towards good practices for deep action recognition[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2016:20-36.
[10] FEICHTENHOFER C,PINZ A,WILDES R P.Spatiotemporal multiplier networks for video action recognition[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:7445-7454.
[11] WANG Jingzhong,HU Kai.Human angle fitting based on BP neural network[J].Computer Systems and Applications,2019,28(8):235-240.(in Chinese) 王景中,胡凯.基于BP回归神经网络的人体角度拟合研究[J].计算机系统应用,2019,28(8):235-240.
[12] ZHANG Rui,LI Qishen,CHU Jun.Human action recognition algorithm based on 3D convolution neural network[J].Computer Engineering,2019,45(1):259-263.(in Chinese) 张瑞,李其申,储珺.基于3D卷积神经网络的人体动作识别算法[J].计算机工程,2019,45(1):259-263.
[13] LI Wei.Analysis of character motion based on single role video[D].Jinan:Shandong University,2018.(in Chinese) 李伟.基于单角色视频的人物运动分析[D].济南:山东大学,2018.
[14] ZIVKOVIC Z.Improved adaptive Gaussian mixture model for background subtraction[C]//Proceedings of International Conference on Pattern Recognition.Washington D.C.,USA:IEEE Press,2004:28-31.
[15] QU Jingjing,XIN Yunhong.Combined continuous frame difference with background difference method for moving object detection[J].Acta Photonica Sinica,2014,43(7):219-226.(in Chinese) 屈晶晶,辛云宏.连续帧间差分与背景差分相融合的运动目标检测方法[J].光子学报,2014,43(7):219-226.
[16] ZHENG Changyan,MEI Wei,WANG Gang.Deep convolutional neural networks for the image recognition of "S-Maneuver" target[J].Fire Control and Command Control,2017,42(4):66-70.(in Chinese)郑昌艳,梅卫,王刚.基于深度卷积神经网络的蛇形机动航迹图像识别[J].火力与指挥控制,2017,42(4):66-70.
[17] SERGEY I,CHRISTIAN S.Batch normalization:accelerating deep network training by reducing internal covariate shift[EB/OL].[2018-12-20].https://arxiv.org/pdf/1502.03167.pdf.
[18] MAATEN L V D,HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(3):2579-2605.
[19] SCHULDT C,LAPTEV I,CAPUTO B.Recognizing human actions:a local SVM approach[C]//Proceedings of International Conference on Pattern Recognition.Washington D.C.,USA:IEEE Press,2004:32-36.
[20] RODRIGUEZ M D,AHMED J,SHAH M.Action MACH:a spatio-temporal maximum average correlation height filter for action recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2008:1-8.
[21] SOOMRO K,ZAMIR A R,SHAH M.UCF101:a dataset of 101 human actions classes from videos in the wild[EB/OL].[2018-12-20].http://export.arxiv.org/pdf/1212.0402.
[22] ZHANG Yahong,LI Yujian.Fisher information metric based on stochastic neighbor embedding[J].Journal of Beijing University of Technology,2016,42(6):862-869.(in Chinese)张亚红,李玉鑑.基于费希尔信息度量的随机近邻嵌入算法[J].北京工业大学学报,2016,42(6):862-869.
[23] ZHANG Congxuan,CHEN Zhen,WANG Mingrun,et al.Non-local TV-L1 optical flow estimation using the weighted neighboring triangle filtering[J].Journal of Image and Graphics,2017,22(8):1056-1067.(in Chinese)张聪炫,陈震,汪明润,等.非局部加权邻域三角滤波TV-L1光流估计[J].中国图象图形学报,2017,22(8):1056-1067.
[24] GUNNAR F.Two-frame motion estimation based on polynomial expansion[C]//Proceedings of the 13th Scandinavian Conference on Image Analysis.Berlin,Germany:Springer,2003:363-370.

选择文件类型/文献管理软件名称

选择包含的内容

基于运动区域差分与卷积神经网络的动作识别

Action Recognition Based on Motion Region Difference and Convolutional Neural Network

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王志浩, 钱沄涛. 基于Swin Transformer的双流遥感图像时空融合超分辨率重建[J]. 计算机工程, 2024, 50(9): 33-45.
[2]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[3]	张鲁, 田春伟, 宋焕生, 刘侍刚. 用于低剂量CT图像去噪的多级双树复小波网络[J]. 计算机工程, 2024, 50(9): 266-275.
[4]	高煜宝, 文志诚. 基于注意力机制的双路解码器图像去噪方法[J]. 计算机工程, 2024, 50(9): 324-332.
[5]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[6]	耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7): 133-143.
[7]	张洋, 刘畅, 李少青. 基于可控制性度量的图神经网络门级硬件木马检测方法[J]. 计算机工程, 2024, 50(7): 164-173.
[8]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.
[9]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.
[10]	逯焕宇, 张永宏, 马光义, 谢东林, 田伟. 基于半监督对抗学习的遥感图像水体提取[J]. 计算机工程, 2024, 50(7): 251-263.
[11]	于洋, 孙芳芳, 吕华, 李扬, 王晓民. 基于多尺度时空注意力网络的微表情检测方法[J]. 计算机工程, 2024, 50(6): 228-235.
[12]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[13]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[14]	李政学, 李枝名, 彭德中, 陈杰. 基于特征对比学习和图卷积的社交网络用户分类[J]. 计算机工程, 2024, 50(4): 258-266.
[15]	姜百浩, 刘静, 仇大伟, 姜良. 深度学习在脊柱图像分割中的应用综述[J]. 计算机工程, 2024, 50(3): 1-15.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于运动区域差分与卷积神经网络的动作识别

Action Recognition Based on Motion Region Difference and Convolutional Neural Network

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价