计算机工程 ›› 2019, Vol. 45 ›› Issue (12): 274-280,293.doi: 10.19678/j.issn.1000-3428.0053623

• 多媒体技术及应用 • 上一篇    下一篇

基于运动区域差分与卷积神经网络的动作识别

陈晓春1, 林博溢2,3, 孙乾2, 张坤华3   

  1. 1. 深圳清华大学研究院 电子设计自动化实验室, 广东 深圳 518057;
    2. 鹏城实验室, 广东 深圳 518082;
    3. 深圳大学 电子与信息工程学院, 广东 深圳 518060
  • 收稿日期:2019-01-09 修回日期:2019-03-01 发布日期:2019-03-15
  • 作者简介:陈晓春(1972-),男,博士,主研方向为机器学习、多媒体信息处理;林博溢、孙乾,硕士研究生;张坤华,副教授、博士。
  • 基金项目:
    广东省科技计划项目(2016B010126003);深圳市基础研究项目(JCYJ20170816151958999)。

Action Recognition Based on Motion Region Difference and Convolutional Neural Network

CHEN Xiaochun1, LIN Boyi2,3, SUN Qian2, ZHANG Kunhua3   

  1. 1. Key Laboratory of Electronic Design Automation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, Guangdong 518057, China;
    2. Peng Cheng Laboratory, Shenzhen, Guangdong 518082, China;
    3. College of Electronics and Information Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
  • Received:2019-01-09 Revised:2019-03-01 Published:2019-03-15

摘要: 针对视频动作识别中数据处理效率不高的问题,建立一种基于视频帧间差分序列的动作识别模型。利用帧间差分检测视频帧中的运动区域,以该区域为中心进行相应的图像剪切和增强处理。整个识别模型采用双流架构,在数据样本制作时通过适当的隔帧差分来扩大样本的时间跨度。采用分阶段逐步增加训练样本量的方法,以提升模型识别性能并解决训练过程中易出现的过拟合问题。实验结果表明,该模型可以在CPU级配置的电脑中完成快速动作识别,且在UCF11和UCF25数据集中的识别准确率均高于85%。

关键词: 帧间差分, 动作识别, 双流架构, 卷积神经网络, 运动区域

Abstract: Aiming at the low efficiency of data processing in video action recognition,this paper proposes an action recognition model based on the difference sequences between video frames.First,this paper uses inter frame difference to detect the motion region in the video frame,and this region is taken as the center where corresponding image clipping and enhancement are carried out.Then,the dual-stream architecture is applied to the recognition model and the time span of the samples is extended by the appropriate frame difference when data samples are made.Finally,the number of training samples is gradually increased,so as to improve the performance of model recognition and tackle the over fitting problem in the training process.Experimental results show that the proposed model can complete fast action recognition in CPU level computers,and its recognition accuracy in UCF11 and UCF25 datasets is higher than 85%.

Key words: inter frame difference, action recognition, dual-stream architecture, Convolutional Neural Network(CNN), motion region

中图分类号: