作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2019, Vol. 45 ›› Issue (1): 259-263. doi: 10.19678/j.issn.1000-3428.0048978

• 图形图像处理 • 上一篇    下一篇

基于3D卷积神经网络的人体动作识别算法

张瑞1,2,李其申2,储珺2   

  1. 1.南昌航空大学 信息工程学院,南昌 330063; 2.江西省图像处理与模式识别重点实验室,南昌 330063
  • 收稿日期:2017-10-16 出版日期:2019-01-15 发布日期:2019-01-15
  • 作者简介:张瑞(1993—),女,硕士研究生,主研方向为图像处理、模式识别;李其申,副教授、博士;储珺,教授、博士、博士生导师。
  • 基金资助:

    国家自然科学基金(61663031);江西省自然科学基金(20132BAB201046);南昌航空大学研究生创新专项资金(YC2016009)

Human Action Recognition Algorithm Based on 3D Convolution Neural Network

ZHANG Rui1,2,LI Qishen2,CHU Jun2   

  1. 1.School of Information Engineering,Nanchang Hangkong University,Nanchang 330063,China; 2.Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition,Nanchang 330063,China
  • Received:2017-10-16 Online:2019-01-15 Published:2019-01-15

摘要:

由于人体动作的多样性、场景嘈杂、摄像机运动视角多变等特性,导致人体动作识别的难度增加。为此,基于3D卷积神经网络,提出一种新的人体动作识别算法。以连续的16帧视频为一组输入,采用视频图像的灰度、x方向梯度、y方向梯度、x方向光流、y方向光流做多通道处理,训练网络参数,经过5层3D卷积、5层3D池化增加提取特征中时间维度的动作信息,最终通过2层全连接与softmax分类器得到识别分类结果。在UCF101数据库上进行实验,结果表明,相比iDT、P-CNN、LRCN算法,该算法具有较高的识别准确率,且运行速度更快。

关键词: 人体动作识别, 多通道, 3D卷积, 3D池化, 时间维度

Abstract:

Human action diversity,scene noise,the camera motion angle changes and other factors increase the difficulty of human action recognition.This paper proposes a human action recognition algorithm based on 3D convolution neural network.Firstly,successive 16 frames of the video are divided into a group as the input.Secondly,the input data is multi-channel processed using the gray,gradient-x,gradient-y,optflow-x and optflow-y,which effectively trains the network parameters.Thirdly,the extracted features are obtained using 5-layer 3D convolution and 5-layer 3D pooling to increase time dimension information,Finally,the recognition results are obtained by two full connection layers and the softmax classifier.Experiment is made on the UCF101 database,and the results show that compared with iDT,P-CNN,LRCN algorithms,the proposed algorithm has a higher accuracy of human action recognition and a faster running speed.

Key words: human action recognition, multi-channel, 3D convolution, 3D pooling, time dimension

中图分类号: