作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 208-215. doi: 10.19678/j.issn.1000-3428.0067439

• 图形图像处理 • 上一篇    下一篇

融合注意力机制的多视图卷积手势识别研究

袁文涛1, 卫文韬2, 高德民1,*()   

  1. 1. 南京林业大学信息科学技术学院, 江苏 南京 210037
    2. 南京理工大学设计艺术与传媒学院, 江苏 南京 210094
  • 收稿日期:2023-04-20 出版日期:2024-03-15 发布日期:2023-08-25
  • 通讯作者: 高德民
  • 基金资助:
    国家自然科学基金(62002171); 江苏省自然科学基金(BK20200464)

Research on Multiview Convolutional Gesture Recognition with Fusion Attention Mechanism

Wentao YUAN1, Wentao WEI2, Demin GAO1,*()   

  1. 1. College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, Jiangsu, China
    2. School of Design Art and Media, Nanjing University of Science and Technology, Nanjing 210094, Jiangsu, China
  • Received:2023-04-20 Online:2024-03-15 Published:2023-08-25
  • Contact: Demin GAO

摘要:

基于表面肌电信号(sEMG)的手势识别在人机交互中发挥着重要作用,然而,由于sEMG具有非线性和随机性,因此提升基于稀疏多通道sEMG的手势识别准确率难度较高。提出一种融合注意力机制的多视图卷积手势识别模型。首先使用200 ms滑动窗口提取经典的sEMG特征集构建多视图输入,其次利用高效通道注意力对多视图特征在通道维度进行加权,以强化有效特征同时弱化无效特征,最后通过多视图卷积从带有注意力权重的肌电特征中提取高层特征,利用高层特征融合模块对其进行融合以降低数据维度并提高模型鲁棒性。在NinaPro DB1、NinaPro DB5、NinaPro DB7 3个肌电公共数据集上进行训练和评估,结果表明,该模型在200 ms滑动采样窗口上的平均识别准确率分别为87.98%、94.97%和89.67%,整段手势动作的平均投票准确率分别为97.38%、98.41%和97.09%,平均信息传输率为1 308.71 bit/min。与传统机器学习方法和近年来前沿的深度学习手势识别方法相比,所提模型在单模态肌电和多模态手势识别上均具有更高的识别准确率,验证了其有效性和通用性。

关键词: 表面肌电信号, 手势识别, 特征提取, 注意力机制, 多视图卷积

Abstract:

Gesture recognition based on surface Electromyography(sEMG) plays an important role in human-computer interactions. However, improving the accuracy of gesture recognition is a challenging task because of the nonlinearity and randomness of sEMG. To this end, this paper proposes a multiview convolutional gesture recognition model that incorporates an attention mechanism. First, a multiview input is constructed by extracting the classical feature set of the sEMG signal using a 200 ms sliding window. Second, Efficient Channel Attention(ECA) is used to weight the multiview features in the channel dimension, to strengthen effective features and weaken ineffective ones. Finally, multiview convolution is used to extract the high-dimensional myoelectric features with attention weights, thereby fusing them using the high-level feature fusion module to reduce data dimensionality and improve model robustness. The models were trained and evaluated on three public EMG datasets, namely NinaPro DB1, NinaPro DB5 and NinaPro DB7, obtaining an average recognition accuracy of 87.98%, 94.97%, 89.67%, respectively over a 200 ms sliding sampling window; the average voting accuracy for the entire gesture movement was 97.38%, 98.41%, 97.09%, respectively, and the average information transfer rate was 1 308.71 bit/min. Compared with traditional machine learning methods and state-of-the-art deep gesture recognition methods that have been developed in recent years, the present model has higher recognition accuracy for both unimodal myoelectric and multi-modal gesture recognition, proving its effectiveness and generality.

Key words: surface Electromyography(sEMG), gesture recognition, feature extraction, attention mechanism, multiview convolution