作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 图形图像处理 • 上一篇    下一篇

基于多列深度3D卷积神经网络的手势识别

易生,梁华刚,茹锋   

  1. (长安大学 电子与控制工程学院,西安710064)
  • 收稿日期:2016-06-06 出版日期:2017-08-15 发布日期:2017-08-15
  • 作者简介:易生(1992— ),男,硕士,主研方向为图像处理、模式识别;梁华刚,副教授;茹锋,教授。
  • 基金资助:

    国家自然科学基金青年基金( 61203374);陕西省自然科学基金国际合作项目(2014KW01-05)。

Hand Gesture Recognition Based on Multi-column Deep 3D Convolutional Neural Network

YI Sheng,LIANG Huagang,RU Feng   

  1. (School of Electronics and Control Engineering,Chang’an University,Xi’an 710064,China)
  • Received:2016-06-06 Online:2017-08-15 Published:2017-08-15

摘要:

传统2D卷积神经网络对于视频连续帧图像的特征提取容易丢失目标时间轴上的运动信息,导致识别准确度较低。为此,提出一种基于多列深度3D卷积神经网络(3D CNN)的手势识别方法。采用3D卷积核对连续帧图像进行卷积操作,提取目标的时间和空间特征捕捉运动信息。为避免因单组3D CNN特征提取不充分而导致的误分类,训练多组具有较强分类能力的3D CNN结构组成多列深度3D CNN,该结构通过对多组3D CNN的输出结果进行权衡,将权重最大的类别判定为最终的输出结果。实验结果表明,将多列深度3D CNN应用于CHGDs数据集上进行手势识别,识别率达到95.09%,与单组3D CNN及传统2D CNN相比分别提高近7%,20%,对连续图像目标识别具有较好的识别能力。

关键词: 视频图像序列处理, 手势识别, 深度学习, 特征提取, 卷积神经网络, 运动目标识别

Abstract:

The feature extraction method adopted by traditional Convolutional Neural Network(CNN) for video image with continuous frames is east to lose movement information on the target time axis,resulting in low recognition accuracy.To solve this problem,a method based on multi-lolu deep 3D is proposed.The 3D convolution kernel is used to extract the temporal and spatial features to capture the object’s motion information.In order to avoid the error classification because of the insufficient feature information of single 3D CNN,the multi-column 3D CNN is consisted by multi-component 3D CNN that each of them has very strong classification ability.The output of this structure is weighed by the output of each of the 3D CNN,and the category which has the maximum weight is determined to be the final result.The structure of multi-column 3D CNNs is applied to the CHGD for hand gesture recognition.Experimental results show that the method achieves a recognition rate of 95.09%,and the recognition rate compared to a single 3D CNN increases by nearly 7%,it increases by nearly 20%compared to the traditional 2D CNN,it has very excellent recognition ability for the video image sequence.

Key words: video image sequence processing, hand gesture recognition, deep learning, feature extraction, Convolutional Neural Network(CNN), moving object recognition

中图分类号: