基于行为主体检测的视频行为快速检测

doi:10.19678/j.issn.1000-3428.0053184

计算机工程 ›› 2019, Vol. 45 ›› Issue (12): 257-262. doi: 10.19678/j.issn.1000-3428.0053184

基于行为主体检测的视频行为快速检测

张杰豪, 陈华杰, 姚勤炜, 侯新雨

杭州电子科技大学自动化学院, 杭州 310018

收稿日期:2018-11-20 修回日期:2018-12-27 发布日期:2019-03-07
作者简介:张杰豪(1994-),男,硕士研究生,主研方向为视频检测、模式识别、机器学习;陈华杰,教授、博士;姚勤炜、侯新雨,硕士研究生。

Fast Video Action Detection Based on Action Subject Detection

ZHANG Jiehao, CHEN Huajie, YAO Qinwei, HOU Xinyu

School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China

Received:2018-11-20 Revised:2018-12-27 Published:2019-03-07

摘要/Abstract

摘要： 现有视频行为检测方法在生成候选区域时采用滑窗操作，处理长视频速度较慢。针对该问题，通过对静态行为主体进行定位，提出一种快速检测方法。将长视频分割为若干个视频单元，在每个单元的第1帧中运用Fast R-CNN算法进行行为主体检测，对检测到行为主体的单元划定时间区域生成行为发生候选区域，以减少行为检测网络的输入数据。在此基础上，采用3D卷积神经网络判别候选区域类别，对行为类区域进行边界回归，得到准确的行为时间轴定位。实验结果表明，该方法检测速度较TURN方法提升2倍以上，其mAP指标只降低0.7%。

关键词: 行为检测, 行为主体检测, 边界回归, 3D卷积神经网络, 视频单元

Abstract: The existing video action detection methods adopt the sliding window operation when generating candidate regions, which process long video speeds slowly.Aiming at this problem,a fast detection method is proposed by detecting the static action subject.First,a long video is divided into several units,and the Fast R-CNN algorithm is adopted to detect the action subject in the first frame of each unit.Then,time zones are defined in the units with action subject to generate action occurrence candidate regions,so as to reduce the input data of the action detection network.On this basis,this paper uses 3D Convolutional Neural Network(CNN) to discriminate the classification of candidate regions.Finally,the boundary regression is performed on action regions,thus obtaining an accurate action time axis positioning.Experimental results show that the detection speed of the proposed method is 2 times higher than that of the TURN method,with an mAP indicator decrease by merely 0.7%.

Key words: action detection, action subject detection, boundary regression, 3D Convolutional Neural Network(CNN), video unit

中图分类号:

TP753

张杰豪, 陈华杰, 姚勤炜, 侯新雨. 基于行为主体检测的视频行为快速检测[J]. 计算机工程, 2019, 45(12): 257-262.

ZHANG Jiehao, CHEN Huajie, YAO Qinwei, HOU Xinyu. Fast Video Action Detection Based on Action Subject Detection[J]. Computer Engineering, 2019, 45(12): 257-262.

http://www.ecice06.com/CN/Y2019/V45/I12/257

图/表 11

20191214134337

20191214134340

20191214134343

20191214134346

20191214134348

20191214134351

20191214134353

20191214134356

20191214134359

20191214134404

20191214134407

参考文献

[1] GUPTA A,SRINIVASAN P,SHI Jianbo,et al.Under-standing videos,constructing plots learning a visually grounded storyline model from annotated videos[C]//Proceedings of CVPR'09.Washington D.C.,USA:IEEE Press,2009:2012-2019.
[2] AGGARWAL J K,RYOO M S.Human activity analysis:a review[J].ACM Computing Surveys,2011,43(3):16.
[3] SHOU Zheng,WANG Dongang,CHANG Shih-Fu.Temporal action localization in untrimmed videos via multi-stage CNNs[C]//Proceedings of CVPR'16.Washington D.C.,USA:IEEE Press,2016:1049-1058.
[4] PINEDA F J.Generalization of back-propagation to recurrent neural networks[J].Physical Review Letters,1987,59(19):2229-2232.
[5] WILLIAMS R J,ZIPSER D.A learning algorithm for continually running fully recurrent neural networks[J].Neural Computation,1989,1(2):270-280.
[6] JI Shuiwang,XU Wei,YANG Ming,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231.
[7] TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2015:4489-4497.
[8] ESCORCIA V,HEILBRON F C,NIEBLES J C,et al.Daps:deep action proposals for action understanding[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2016:768-784.
[9] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[10] KRISHNA R,HATA K,REN F,et al.Dense-captioning events in videos[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:706-715.
[11] YEUNG S,RUSSAKOVSKY O,MORI G,et al.End-to-end learning of action detection from frame glimpses in videos[C]//Proceedings of CVPR'16.Washington D.C.,USA:IEEE Press,2016:2678-2687.
[12] JIANG Yugang,LIU Jingen,ZAMIR A R,et al.THUMOS challenge:action recognition with a large number of classes[EB/OL].[2018-11-01].https://www.crcv.ucf.edu/THUMOS14/.
[13] GIRSHICK R.Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:1440-1448.
[14] LIN Fengxiao,CHEN Huajie,YAO Qinwei,et al.Target fast detection algorithm based on hybrid structure convolutional neural network[J].Computer Engineering,2018,44(12):222-227.(in Chinese)林封笑,陈华杰,姚勤炜,等.基于混合结构卷积神经网络的目标快速检测算法[J].计算机工程,2018,44(12):228-233.
[15] GAO Jiyang,YANG Zhenheng,CHEN Kan,et al.TURN TAP:temporal unit regression network for temporal action proposals[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:3628-3636.
[16] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich featzure hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE CVPR'14.Washington D.C.,USA:IEEE Press,2014:580-587.
[17] REN Shaoqing,HE Kaiming,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of Advances in Neural Information Processing Systems.[S.l.]:Neural Information Processing Systems,Inc.,2015:91-99.
[18] NOWOZIN S.Optimal decisions from probabilistic models:the intersection-over-union case[C]//Proceedings of CVPR'14.Washington D.C.,USA:IEEE Press,2014:548-555.
[19] HINTON G E,SALAKHUTDINOV R R.Replicated softmax:an undirected topic model[C]//Proceedings of Advances in Neural Information Processing Systems.[S.l.]:Neural Information Processing Systems,Inc.,2009:1607-1614.
[20] HEILBRON F C,ESCORCIA V,GHANEM B,et al.Activitynet:a large-scale video benchmark for human activity understanding[C]//Proceedings of CVPR'15.Washington D.C.,USA:IEEE Press,2015:961-970.
[21] SOOMRO K,ZAMIR A R,SHAH M.UCF101:a dataset of 101 human actions classes from videos in the wild[EB/OL].[2018-11-01].http://crcv.ucf.edu/data/UCF101.php.

选择文件类型/文献管理软件名称

选择包含的内容

基于行为主体检测的视频行为快速检测

Fast Video Action Detection Based on Action Subject Detection

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 9

编辑推荐

Metrics

本文评价

[1]	张瑷涵, 刘翔, 石蕴玉, 刘思齐. 基于深度学习的双流程短视频分类方法[J]. 计算机工程, 2022, 48(7): 277-283.
[2]	徐访, 黄俊, 陈权. 基于3D卷积神经网络的动态手势识别模型[J]. 计算机工程, 2021, 47(11): 283-291.
[3]	党小超,黄亚宁,郝占军,司雄. 一种基于信道状态信息的室内人员行为检测方法[J]. 计算机工程, 2018, 44(8): 79-85.
[4]	陈乾国. 一种干部在线作弊学习行为分析与预测策略[J]. 计算机工程, 2017, 43(9): 17-22,28.
[5]	熊饶饶,胡学敏,陈龙,周慧子. 利用综合光流直方图的人群异常行为检测[J]. 计算机工程, 2017, 43(10): 228-233.
[6]	王新志, 孙乐昌, 张旻, 陈韬. 基于序列模式发现的恶意行为检测方法[J]. 计算机工程, 2011, 37(24): 1-3.
[7]	王丽娜;谭小彬;潘剑锋;奚宏生. 恶意代码检测中的PrefixSpan*算法应用[J]. 计算机工程, 2010, 36(7): 119-121.
[8]	朱士瑞;耿春梅;许晓东;. 基于EBP的宏观网络流量异常行为检测[J]. 计算机工程, 2009, 35(13): 131-133.
[9]	卢鋆;吴忠望;王宇;卢昱. 基于kNN算法的异常行为检测方法研究[J]. 计算机工程, 2007, 33(07): 133-134.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于行为主体检测的视频行为快速检测

Fast Video Action Detection Based on Action Subject Detection

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 9

编辑推荐

Metrics

本文评价