[1] GUPTA A,SRINIVASAN P,SHI Jianbo,et al.Under-standing videos,constructing plots learning a visually grounded storyline model from annotated videos[C]//Proceedings of CVPR'09.Washington D.C.,USA:IEEE Press,2009:2012-2019. [2] AGGARWAL J K,RYOO M S.Human activity analysis:a review[J].ACM Computing Surveys,2011,43(3):16. [3] SHOU Zheng,WANG Dongang,CHANG Shih-Fu.Temporal action localization in untrimmed videos via multi-stage CNNs[C]//Proceedings of CVPR'16.Washington D.C.,USA:IEEE Press,2016:1049-1058. [4] PINEDA F J.Generalization of back-propagation to recurrent neural networks[J].Physical Review Letters,1987,59(19):2229-2232. [5] WILLIAMS R J,ZIPSER D.A learning algorithm for continually running fully recurrent neural networks[J].Neural Computation,1989,1(2):270-280. [6] JI Shuiwang,XU Wei,YANG Ming,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231. [7] TRAN D,BOURDEV L,FERGUS R,et al.Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2015:4489-4497. [8] ESCORCIA V,HEILBRON F C,NIEBLES J C,et al.Daps:deep action proposals for action understanding[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2016:768-784. [9] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. [10] KRISHNA R,HATA K,REN F,et al.Dense-captioning events in videos[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:706-715. [11] YEUNG S,RUSSAKOVSKY O,MORI G,et al.End-to-end learning of action detection from frame glimpses in videos[C]//Proceedings of CVPR'16.Washington D.C.,USA:IEEE Press,2016:2678-2687. [12] JIANG Yugang,LIU Jingen,ZAMIR A R,et al.THUMOS challenge:action recognition with a large number of classes[EB/OL].[2018-11-01].https://www.crcv.ucf.edu/THUMOS14/. [13] GIRSHICK R.Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:1440-1448. [14] LIN Fengxiao,CHEN Huajie,YAO Qinwei,et al.Target fast detection algorithm based on hybrid structure convolutional neural network[J].Computer Engineering,2018,44(12):222-227.(in Chinese)林封笑,陈华杰,姚勤炜,等.基于混合结构卷积神经网络的目标快速检测算法[J].计算机工程,2018,44(12):228-233. [15] GAO Jiyang,YANG Zhenheng,CHEN Kan,et al.TURN TAP:temporal unit regression network for temporal action proposals[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:3628-3636. [16] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich featzure hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE CVPR'14.Washington D.C.,USA:IEEE Press,2014:580-587. [17] REN Shaoqing,HE Kaiming,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of Advances in Neural Information Processing Systems.[S.l.]:Neural Information Processing Systems,Inc.,2015:91-99. [18] NOWOZIN S.Optimal decisions from probabilistic models:the intersection-over-union case[C]//Proceedings of CVPR'14.Washington D.C.,USA:IEEE Press,2014:548-555. [19] HINTON G E,SALAKHUTDINOV R R.Replicated softmax:an undirected topic model[C]//Proceedings of Advances in Neural Information Processing Systems.[S.l.]:Neural Information Processing Systems,Inc.,2009:1607-1614. [20] HEILBRON F C,ESCORCIA V,GHANEM B,et al.Activitynet:a large-scale video benchmark for human activity understanding[C]//Proceedings of CVPR'15.Washington D.C.,USA:IEEE Press,2015:961-970. [21] SOOMRO K,ZAMIR A R,SHAH M.UCF101:a dataset of 101 human actions classes from videos in the wild[EB/OL].[2018-11-01].http://crcv.ucf.edu/data/UCF101.php. |