Sign Language Recognition Using Data Gloves Based on EWBiLSTM-ATT

doi:10.19678/j.issn.1000-3428.0070202

Abstract

Abstract: Sign language recognition has received widespread attention in recent years. However, existing sign language recognition models face challenges, such as long training times and high computational costs. To address this issue, this study proposes a hybrid deep learning method that integrates an attention mechanism with an Expanded Wide-kernel Deep Convolutional Neural Network (EWDCNN) and a Bidirectional Long Short-Term Memory (BiLSTM) network based on data obtained from a wearable data glove, EWBiLSTM-ATT model. First, by widening the first convolutional layer, the model parameter count is reduced, which enhances computational speed. Subsequently, by deepening the EWDCNN convolutional layers, the model's ability to automatically extract features from sign language is improved. Second, BiLSTM is introduced as a temporal model to capture the dynamic temporal information of sign language sequential data, effectively handling temporal relationships in the sensor data. Finally, the attention mechanism is employed to map the weighted sum and learn a parameter matrix that assigns different weights to the hidden states of BiLSTM, allowing the model to automatically select key time segments related to gesture actions by calculating the attention weights for each time step. This study uses the STM32F103 as the main control module and builds a data glove sign language acquisition platform with MPU6050 and Flex Sensor 4.5 sensors as the core components. Sixteen dynamic sign language actions are selected to construct the GR-Dataset data training model. Under the same experimental conditions, compared to the CLT-net, CNN-GRU, CLA-net, and CNN-GRU-ATT models, the recognition rate of the EWBiLSTM-ATT model is 99.40%, which is increased by 10.36, 8.41, 3.87, and 3.05 percentage points, respectively. Further, the total training time is reduced to 57%, 61%, 55%, and 56% of the comparison models, respectively.

Key words: Expanded Wide-kernel Deep Convolutional Neural Network (EWDCNN), Bidirectional Long Short-Term Memory (BiLSTM) network, attention module, sign language recognition, data glove, deep learning

摘要： 手语识别近年来受到广泛关注,但现有手语识别模型存在训练时间长和计算成本高的问题。为此,基于穿戴式数据手套提出一种融合注意力机制的首层宽卷积核扩展深度卷积神经网络(EWDCNN)和双向长短期记忆网络(BiLSTM)的混合深度学习方法——EWBiLSTM-ATT模型。首先通过加宽首层卷积层来减少模型参数量,提升计算速度,通过扩展WDCNN卷积层深度来提高模型自动提取手语特征的能力;其次引入BiLSTM作为时间建模器捕捉手语序列数据的时间动态信息,有效处理传感器数据中的时序关系;最后利用注意力机制通过映射加权和学习参数矩阵赋予BiLSTM隐含状态不同权重,通过计算每个时间段的注意力权重,模型自动选择与手势动作相关的关键时间段。以STM32F103为主控模块,以MPU6050与Flex Sensor 4.5传感器为核心搭建数据手套手语采集平台。选取16种动态手语动作用于构建GR-Dataset数据训练模型。同一实验条件下,EWBiLSTM-ATT准确率为99.40%,相对于CLT-net、CNN-GRU、CLA-net、CNN-GRU-ATT模型分别提升10.36、8.41、3.87、3.05百分点,训练总时间分别缩减至这4种对比模型的57%、61%、55%、56%。

关键词: 扩展深度卷积神经网络, 双向长短期记忆网络, 注意力模块, 手语识别, 数据手套, 深度学习

CLC Number:

TP18

WU Donghui, WANG Jinfeng, QIU Sen, LIU Guozhi. Sign Language Recognition Using Data Gloves Based on EWBiLSTM-ATT[J]. Computer Engineering, 2025, 51(8): 107-119.

武东辉, 王金凤, 仇森, 刘国志. 基于EWBiLSTM-ATT的数据手套手语识别[J]. 计算机工程, 2025, 51(8): 107-119.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0070202

https://www.ecice06.com/EN/Y2025/V51/I8/107

References

[1] ABDELGHFAR H A, AHMED A M, ALANI A A, et al. QSLRS-CNN: Qur'anic sign language recognition system based on convolutional neural networks[J]. The Imaging Science Journal, 2024, 72(2): 254-266.
[2] XIA K, LU W, FAN H, et al. A sign language recognition system applied to deaf-mute medical consultation[J]. Sensors (Basel), 2022, 22(23): 9107.
[3] LI Z J, WANG H P, RUAN W H, et al. Circuit and system design for assisting virtual reality with data glove and electric stimulation tactile enhancement feedback[J]. Electronic Science and Technology, 2021, 34(3): 28-35, 42. (in Chinese) 李兆基, 王海鹏, 阮伟华, 等. 辅助虚拟现实数据手套及电刺激力触觉增强反馈电路系统设计[J]. 电子科技, 2021, 34(3): 28-35, 42.
[4] KRALJEVIĆ L, RUSSO M, PAUKOVIĆ M, et al. A dynamic gesture recognition interface for smart home control based on Croatian sign language[J]. Applied Sciences, 2020, 10(7): 2300.
[5] AKDAG A, BAYKAN O K. Enhancing signer-independent recognition of isolated sign language through advanced deep learning techniques and feature fusion[J]. Electronics, 2024, 13(7): 1188.
[6] ZHANG Y Q, JIANG X W. Recent advances on deep learning for sign language recognition[J]. Computer Modeling in Engineering & Sciences, 2024, 139(3): 2399-2450.
[7] XIONG B P, CHEN W S, NIU Y X, et al. A global and local feature fused CNN architecture for the sEMG-based hand gesture recognition[J]. Computers in Biology and Medicine, 2023, 166: 107497.
[8] NARAYAN S, MAZUMDAR A P, VIPPARTHI S K. SBI-DHGR: skeleton-based intelligent dynamic hand gestures recognition[J]. Expert Systems with Applications, 2023, 232: 120735.
[9] AROOJ S, ALTAF S, AHMAD S, et al. Enhancing sign language recognition using CNN and SIFT: a case study on Pakistan sign language[J]. Journal of King Saud University-Computer and Information Sciences, 2024, 36(2): 101934.
[10] SAMAAN G H, WADIE A R, ATTIA A K, et al. MediaPipe's landmarks with RNN for dynamic sign language recognition[J]. Electronics, 2022, 11(19): 3228.
[11] YU J, HU C H, JING X Y, et al. Deep metric learning with dynamic margin hard sampling loss for face verification[J]. Signal, Image and Video Processing, 2020, 14(4): 791-798.
[12] HAMZA M Z, EVEN L F, FILIPPO S. Empowering human-robot interaction using sEMG sensor: hybrid deep learning model for accurate hand gesture recognition[J]. Results in Engineering, 2023, 20: 101639.
[13] GAO Y, YAN K, DAI B S, et al. Recognition of aggressive behavior of group-housed pigs based on CNN-GRU hybrid model with spatio-temporal attention mechanism[J]. Computers and Electronics in Agriculture, 2023, 205: 107606.
[14] RAHAMAN M A, OYSHE K U, CHOWDHURY P K, et al. Computer vision-based six layered ConvNeural network to recognize sign language for both numeral and alphabet signs[J]. Biomimetic Intelligence and Robotics, 2024, 4(1): 100141.
[15] JI A, WANG Y, MIAO X, et al. Dataglove for sign language recognition of people with hearing and speech impairment via wearable inertial sensors[J]. Sensors (Basel), 2023, 23(15): 6693.
[16] KUMARI D, ANAND R S. Isolated video-based sign language recognition using a hybrid CNN-LSTM framework based on attention mechanism[J]. Electronics, 2024, 13(7): 1229.
[17] RAWF H M K, ABDULRAHMAN A O, MOHAMMED A A. Improved recognition of Kurdish sign language using modified CNN[J]. Computers, 2024, 13(2): 37.
[18] FANG Y C, WANG L J, LIN S Q, et al. Visual feature segmentation with reinforcement learning for continuous sign language recognition[J]. International Journal of Multimedia Information Retrieval, 2023, 12(2): 39.
[19] XUE C H, JIA J L, YU M, et al. Continuous sign language recognition based on hierarchical memory sequence network[J]. IET Computer Vision, 2024, 18(2): 247-259.
[20] QIU S, FAN T Q, JIANG J H, et al. A novel two-level interactive action recognition model based on inertial data fusion[J]. Information Sciences, 2023, 633: 264-279.
[21] WU J, REN P, SONG B, et al. Data glove-based gesture recognition using CNN-BiLSTM model with attention mechanism[J]. PLoS One, 2023, 18(11): e0294174.
[22] FU R R, ZHANG B Z, LIANG H F, et al. Gesture recognition of sEMG signal based on GASF-LDA feature enhancement and adaptive ABC optimized SVM[J]. Biomedical Signal Processing and Control, 2023, 85: 105104.
[23] KIM M, CHO J, LEE S, et al. IMU sensor-based hand gesture recognition for human-machine interfaces[J]. Sensors (Basel), 2019, 19(18): 3827.
[24] SIDDIQUI N, CHAN R H M. Multimodal hand gesture recognition using single IMU and acoustic measurements at wrist[J]. PLoS One, 2020, 15(1): e0227039.
[25] PLAWIAK P, SOSNICKI T, NIEDZWIECKI M, et al. Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms[J]. IEEE Transactions on Industrial Informatics, 2016, 12(3): 1104-1113.
[26] GUO H F, XIANG C C, CHEN S Q. Wearable sensors for human activity recognition based on a self-attention CNN-BiLSTM model[J]. Sensor Review, 2023, 43(5/6): 347-358.
[27] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2024-06-01]. http://de.arxiv.org/pdf/1409.0473.
[28] SANDOVAL-ESPINO J A, ZAMUDIO-LARA A, MARBÁN-SALGADO J A, et al. Selection of the best set of features for sEMG-based hand gesture recognition applying a CNN architecture[J]. Sensors (Basel), 2022, 22(13): 4972.
[29] SUN Y X, ZHAO W W, WU D H, et al. Research of human activity recognition based on convolutional long short-term memory network[J]. Computer Engineering, 2021, 47(10): 260-268. (in Chinese) 孙彦玺, 赵婉婉, 武东辉, 等. 基于卷积长短时记忆网络的人体行为识别研究[J]. 计算机工程, 2021, 47(10): 260-268.
[30] ZHOU H, ZHAO Y, LIU Y, et al. Multi-sensor data fusion and CNN-LSTM model for human activity recognition system[J]. Sensors (Basel), 2023, 23(10): 4750.
[31] DUA N, SINGH S N, SEMWAL V B. Multi-input CNN-GRU based human activity recognition using wearable sensors[J]. Computing, 2021, 103(7): 1461-1478.
[32] ZHANG Y P, WILKER K. Visual-and-language multimodal fusion for sweeping robot navigation based on CNN and GRU[J]. Journal of Organizational and End User Computing, 2024, 36(1): 1-21.
[33] 武东辉, 许静, 陈继斌, 等. 基于融合注意力机制与CNN-LSTM的人体行为识别算法[J]. 科学技术与工程, 2023, 23(2): 681-689.

Please choose a citation manager

Content to export