结合注意力机制与特征融合的静态手势识别算法

doi:10.19678/j.issn.1000-3428.0060912

摘要/Abstract

摘要： 卷积神经网络在手势识别领域应用广泛，但现有的卷积神经网络存在特征表征不足的问题，导致手势识别精度较低。提出一种轻量级静态手势识别算法r-mobilenetv2，通过串联通道注意力与空间注意力，将两者输出的特征图以跳跃连接的形式线性相加，得到一种全新的注意力机制。使用一维卷积调整低层特征的通道维度，将低级特征与经过上采样的高层特征进行空间维度匹配及通道维度匹配，并进行线性相加，其结果经卷积操作后与高层特征按通道维度连接，从而实现特征融合。在此基础上，将所提注意力机制与特征融合相结合，并用于改进后的轻量级网络MobileNetV2中，得到r-mobilenetv2算法。实验结果表明，与MobileNetV2算法相比，r-mobilenetv2算法的参数量降低了27%，错误率下降了1.82个百分点。

关键词: 注意力机制, 特征融合, 手势识别, 图片分类, 轻量级网络

Abstract: Convolutional neural networks are widely used in the field of gesture recognition, but the existing convolutional neural networks have the problem of insufficient feature representation, resulting in low gesture recognition accuracy.This study proposes a lightweight static gesture recognition algorithm, r-mobilenetv2.By concatenating the channel attention and spatial attention, the output characteristic graphs of the two are linearly added in the form of a jump connection to obtain a new attention mechanism.Simultaneously, the channel dimension of the low-level features is adjusted by one-dimensional convolution.The low-level features are matched with the up-sampled high-level features in the spatial and channel dimensions, and they are added linearly.The results are connected to high-level features according to the channel dimension after convolution to realize feature fusion.On this basis, the proposed attention mechanism is combined with feature fusion and applied to the improved lightweight network MobileNetV2 to obtain the r-mobilenetv2 algorithm.The experimental results show that, compared with the MobileNetV2 algorithm, the number of parameters and error rate of the r-mobilenetv2 algorithm are reduced by 27% and 1.82 percentage points, respectively.

Key words: attention mechanism, feature fusion, hand guesture recognition, image classification, lightweight network

中图分类号:

TP391.4

胡宗承, 周亚同, 史宝军, 何昊. 结合注意力机制与特征融合的静态手势识别算法[J]. 计算机工程, 2022, 48(4): 240-246.

HU Zongcheng, ZHOU Yatong, SHI Baojun, HE Hao. Static Gesture Recognition Algorithm Based on Attention Mechanism and Feature Fusion[J]. Computer Engineering, 2022, 48(4): 240-246.

https://www.ecice06.com/CN/Y2022/V48/I4/240

图/表 12

20230131202135

20230131202139

20230131202142

20230131202145

20230131202148

20230131202151

20230131202154

20230131202157

20230131202200

20230131202204

20230131202207

20230131202210

参考文献

[1] LIAN K Y, CHIU C C, HONG Y J, et al.Wearable armband for real time hand gesture recognition[C]//Proceedings of 2017 IEEE International Conference on Systems, Man, and Cybernetics.Washington D.C., USA:IEEE Press, 2017:2992-2995.
[2] YANG A, CHUN S M, KIM J G.Detection and recognition of hand gesture for wearable applications in IoMTW[C]//Proceedings of the 19th International Conference on Advanced Communication Technology.Washington D.C., USA:IEEE Press, 2017:598-601.
[3] WANG X, ZHOU Z, LI Y, et al.An algorithm for detecting the HOG features of head and shoulder of football players based on SVM classifier[C]//Proceedings of 2020 International Conference on Intelligent Transportation.Washington D.C., USA:IEEE Press, 2020:845-849.
[4] NGUYEN N, BUI D, TRAN X.A novel hardware architecture for human detection using HOG-SVM co-optimization[C]//Proceedings of 2019 IEEE Asia Pacific Conference on Circuits and Systems.Washington D.C., USA:IEEE Press, 2019:33-36.
[5] ARAVINDA C V, MENG L, PRABHU A.Signature recognition and verification using multiple classifiers combination of Hu's and HOG features[C]//Proceedings of 2019 International Conference on Advanced Mechatronic Systems.Washington D.C., USA:IEEE Press, 2019:63-68.
[6] ZHONG B, LI Y.Image feature point matching based on improved SIFT algorithm[C]//Proceedings of 2019 IEEE International Conference on Image, Vision and Computing.Washington D.C., USA:IEEE Press, 2019:489-493.
[7] 文芳, 康彩琴, 陈立文, 等.基于RGBD数据的静态手势识别[J].计算机与现代化, 2018(1):74-77. WEN F, KANG C Q, CHEN L W, et al.Static handgesture recognition based on RGB data[J].Computer and Modernization, 2018(1):74-77.(in Chinese)
[8] TARVEKAR M P.Hand gesture recognition system for touch-less car interface using multiclass support vector machine[C]//Proceedings of 2018 International Conference on Intelligent Computing and Control Systems.Washington D.C., USA:IEEE Press, 2018:1929-1932.
[9] 缑新科, 王瑶.基于特征融合的静态手势识别[J].计算机与数字工程, 2018, 46(7):1336-1340. GOU X K, WANG Y.Static gesture recognition based on feature fusion[J].Computer and Digital Engineering, 2018, 46(7):1336-1340.(in Chinese)
[10] 吴晓凤, 张江鑫, 徐欣晨.基于Faster RCNN的手势识别算法[J].计算机辅助设计与图形学学报, 2018, 32:187-192. WU X F, ZHANG J X, XU X C.Hand gesture recognition algorithm based on faster R-CNN[J].Journal of Computer Aided Design and Graphics.2018, 32(6):187-192.(in Chinese)
[11] 张强, 张勇, 刘芝国, 等.基于改进YOLOv3的手势实时识别方法[J].计算机工程, 2020, 46(3):237-245, 253. ZHANG Q, ZHANG Y, LIU Z G, et al.Real-time hand gesture recognition method based on improved YOLOv3[J].Computer Engineering, 2020, 46(3):237-245, 253.(in Chinese)
[12] 周文军, 张勇, 王昱洁.基于DSSD的静态手势实时识别方法[J].计算机工程, 2020, 46(2):255-261. ZHOU W J, ZHANG Y, WANG Y J.Real-time recognition method for static gestures based on DSSD[J].Computer Engineering, 2020, 46(2):255-261.(in Chinese)
[13] CHAUDHARY A, RAHEGA J L.Light invariant real-time robust hand gesture recognition[J].Optik, 2018, 159:283-294.
[14] ALNUJAIM I, ALALI H, KHAN F, et al.Hand gesture recognition using input impedance variation of two antennas with transfer learning[J].IEEE Sensors Journal, 2018, 18(10):4129-4135.
[15] MNIH V, HEESS N, GRAVES A, et al.Recurrent models of visual attention[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2014:8-13.
[16] WANG F, JIANG M, QIAN C, et al.Residual attention network for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:3156-3164.
[17] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7132-7141.
[18] PARK J, WOO S, LEE J Y, et al.BAM:bottleneck attention module[EB/OL].[2021-01-20].https://www.researchgate.net/publication/263390366_Recurrent_Models_of_Visual_Attention.
[19] WOO S, PARK J, LEE J Y, et al.Cbam:convolutional block attention module[C]//Proceedings of 2018 European Conference on Computer Vision.New York, USA:ACM Press, 2018:3-19.
[20] WANG Q, WU B, ZHU P, et al.ECA-Net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognitio.Washington D.C., USA:IEEE Press, 2020:13-19.
[21] 武茜, 贾世杰.基于多通道注意力机制的人脸替换鉴别[J].计算机工程, 2022, 48(2):180-185, 193. WU Q, JIA S J.Face swapping detection based on multi-channel attention mechanism[J].Computer Engineering, 2022, 48(2):180-185, 193.(in Chinese)
[22] 鲁甜, 刘蓉, 刘明, 等.基于特征图注意力机制的图像超分辨率重建[J].计算机工程, 2021, 47(3):261-268. LU T, LIU R, LIU M, et al.Image super-resolution reconstruction based on attention mechanism of feature map[J].Computer Engineering, 2021, 47(3):261-268.(in Chinese)
[23] LIN T Y, DOLLAR P, GIRSHICK R, et al.Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:936-944.
[24] LIU S, QI L, QIN H, et al.Path aggregation network for instance segmentation[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:8759-8768.
[25] CAO J, CHEN Q, GUO J, et al.Attention-guided context feature pyramid network for object detection[EB/OL].[2021-01-20].https://arxiv.org/abs/2005.11475v1.
[26] 陈泽, 叶学义, 钱丁炜, 等.基于改进Faster R-CNN的小尺度行人检测[J].计算机工程, 2020, 46(9):226-232, 241. CHEN Z, YE X Y, QIAN D W, et al.Small-scale pedestrian detection based on improved Faster R-CNN[J].Computer Engineering, 2020, 46(9):226-232, 241.(in Chinese)
[27] 李季, 周轩弘, 何勇, 等.基于尺度不变性与特征融合的目标检测算法[J].南京大学学报(自然科学), 2021, 57(2):237-244. LI J, ZHOU X H, HE Y, et al.The algorithm based on scale in variance and feature fusion for object detection[J].Jouranal of Nan Jing University(Nature Science), 2021, 57(2):237-244.(in Chinese)
[28] SANDLER M, HOWARD A, ZHU M, et al.MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:4510-4520.
[29] HOWARD A G, ZHU M, CHEN B, et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2021-01-20].https://arxiv.org/abs/1704.04861.
[30] HOWARD A, SANDLER M, CHU G, et al.Searching for MobileNetV3[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:1314-1324.
[31] HSIAO Y S, SANCHEZRIERA J, LIM T, et al.LaRED:a large RGB-D extensible hand gesture dataset[C]//Proceedings of 2014 ACM Multimedia Systems Conference.New York, USA:ACM Press, 2014:53-58.
[32] HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[33] MA N, ZHANG X, ZHENG H T, et al.ShuffleNet V2:practical guidelines for efficient CNN architecture design[C]//Proceedings of the European Conference on Computer Vision.New York, USA:ACM Press, 2018:116-131.

选择文件类型/文献管理软件名称

选择包含的内容