Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (8): 107-119. doi: 10.19678/j.issn.1000-3428.0070202

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Sign Language Recognition Using Data Gloves Based on EWBiLSTM-ATT

WU Donghui1, WANG Jinfeng1, QIU Sen2, LIU Guozhi1   

  1. 1. College of Building Environment Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, Henan, China;
    2. School of Control Science and Engineering, Dalian University of Technology, Dalian 116081, Liaoning, China
  • Received:2024-08-05 Revised:2024-10-16 Online:2025-08-15 Published:2024-12-13

基于EWBiLSTM-ATT的数据手套手语识别

武东辉1, 王金凤1, 仇森2, 刘国志1   

  1. 1. 郑州轻工业大学建筑环境工程学院, 河南 郑州 450002;
    2. 大连理工大学控制科学与工程学院, 辽宁 大连 116081
  • 通讯作者: 武东辉,E-mail:w_donghui@163.com E-mail:w_donghui@163.com
  • 基金资助:
    国家自然科学基金(62272081);河南省科技攻关项目(222102210086,232102321021,252102210093);河南省高等学校重点科研项目(25B413005)。

Abstract: Sign language recognition has received widespread attention in recent years. However, existing sign language recognition models face challenges, such as long training times and high computational costs. To address this issue, this study proposes a hybrid deep learning method that integrates an attention mechanism with an Expanded Wide-kernel Deep Convolutional Neural Network (EWDCNN) and a Bidirectional Long Short-Term Memory (BiLSTM) network based on data obtained from a wearable data glove, EWBiLSTM-ATT model. First, by widening the first convolutional layer, the model parameter count is reduced, which enhances computational speed. Subsequently, by deepening the EWDCNN convolutional layers, the model's ability to automatically extract features from sign language is improved. Second, BiLSTM is introduced as a temporal model to capture the dynamic temporal information of sign language sequential data, effectively handling temporal relationships in the sensor data. Finally, the attention mechanism is employed to map the weighted sum and learn a parameter matrix that assigns different weights to the hidden states of BiLSTM, allowing the model to automatically select key time segments related to gesture actions by calculating the attention weights for each time step. This study uses the STM32F103 as the main control module and builds a data glove sign language acquisition platform with MPU6050 and Flex Sensor 4.5 sensors as the core components. Sixteen dynamic sign language actions are selected to construct the GR-Dataset data training model. Under the same experimental conditions, compared to the CLT-net, CNN-GRU, CLA-net, and CNN-GRU-ATT models, the recognition rate of the EWBiLSTM-ATT model is 99.40%, which is increased by 10.36, 8.41, 3.87, and 3.05 percentage points, respectively. Further, the total training time is reduced to 57%, 61%, 55%, and 56% of the comparison models, respectively.

Key words: Expanded Wide-kernel Deep Convolutional Neural Network (EWDCNN), Bidirectional Long Short-Term Memory (BiLSTM) network, attention module, sign language recognition, data glove, deep learning

摘要: 手语识别近年来受到广泛关注,但现有手语识别模型存在训练时间长和计算成本高的问题。为此,基于穿戴式数据手套提出一种融合注意力机制的首层宽卷积核扩展深度卷积神经网络(EWDCNN)和双向长短期记忆网络(BiLSTM)的混合深度学习方法——EWBiLSTM-ATT模型。首先通过加宽首层卷积层来减少模型参数量,提升计算速度,通过扩展WDCNN卷积层深度来提高模型自动提取手语特征的能力;其次引入BiLSTM作为时间建模器捕捉手语序列数据的时间动态信息,有效处理传感器数据中的时序关系;最后利用注意力机制通过映射加权和学习参数矩阵赋予BiLSTM隐含状态不同权重,通过计算每个时间段的注意力权重,模型自动选择与手势动作相关的关键时间段。以STM32F103为主控模块,以MPU6050与Flex Sensor 4.5传感器为核心搭建数据手套手语采集平台。选取16种动态手语动作用于构建GR-Dataset数据训练模型。同一实验条件下,EWBiLSTM-ATT准确率为99.40%,相对于CLT-net、CNN-GRU、CLA-net、CNN-GRU-ATT模型分别提升10.36、8.41、3.87、3.05百分点,训练总时间分别缩减至这4种对比模型的57%、61%、55%、56%。

关键词: 扩展深度卷积神经网络, 双向长短期记忆网络, 注意力模块, 手语识别, 数据手套, 深度学习

CLC Number: