可变形空洞卷积与三支注意力机制的手写数学表达式识别

doi:10.19678/j.issn.1000-3428.0253486

摘要/Abstract

摘要： 手写数学表达式识别作为计算机视觉领域的重要任务之一，在智能教育、工业应用等诸多方面均发挥着重要作用。现有基于编码器-解码器结构的手写数学表达式识别模型通常采用普通卷积和传统注意力机制来提取特征。然而，普通卷积的固定网格采样忽略了手写字符的几何变形，导致形近字符误识率较高；此外，传统注意力机制的单一交互导致对长程结构依赖的捕捉能力不足。为此，研究基于编码器-解码器结构提出了一个基于可变形空洞卷积和三支注意力特征融合的模型，在可变形卷积的偏移量学习和自定义卷积层中融入可学习的空洞率，实现对偏移量的更准确预测和感受野的自适应扩展；同时，三支注意力特征融合机制通过相似度引导的动态融合策略，实现跨维度信息的协同增强，避免了传统注意力机制的单一维度交互不足。模型在编码器中采用可变形空洞卷积来扩大自身感受野，捕捉不同尺度的特征，提升对更大范围内上下文信息的捕捉能力；采用三支注意力特征融合机制，有效整合不同层次的特征信息，增强模型对关键特征的提取能力；解码器迭代为Transformer，强化长程依赖建模。模型在CROHME 2014、2016、2019公开数据集和HME100K数据集上的实验中分别获得了59.34%、59.77%、59.63%和68.94%的识别准确率，较基准模型分别提高了2.34%、3.71%、4.75%和1.63%，验证了模型的有效性与优越性。

Abstract: Handwritten mathematical expression recognition is an important task in computer vision and plays a significant role in intelligent education, industrial applications, and related fields. Existing encoder-decoder-based methods typically rely on standard convolutions and conventional attention mechanisms for feature extraction. However, the fixed-grid sampling of standard convolution cannot effectively adapt to the geometric deformations of handwritten symbols, which often leads to confusion between visually similar characters. In addition, traditional attention mechanisms usually involve limited cross-dimensional interaction, making it difficult to capture long-range structural dependencies in complex mathematical expressions. To address these issues, this paper proposes a handwritten mathematical expression recognition model based on an encoder-decoder architecture, termed DDTAFF, which integrates deformable dilated convolution and triplet attention feature fusion. Specifically, deformable dilated convolution incorporates learnable dilation rates into both the offset learning process and the customized convolution operation of deformable convolution, enabling more accurate offset prediction and adaptive expansion of the receptive field. Meanwhile, triplet attention feature fusion adopts a similarity-guided dynamic fusion strategy to enhance cross-dimensional feature interaction and improve the extraction of discriminative features. In the encoder, deformable dilated convolution is used to capture multi-scale features and broader contextual information, while triplet attention feature fusion effectively fuses features at different levels to strengthen the representation of critical regions. In the decoder, a Transformer-based structure is introduced to enhance long-range dependency modeling. Experimental results on the CROHME 2014, CROHME 2016, CROHME 2019, and HME100K datasets show that the proposed model achieves recognition accuracies of 59.34%, 59.77%, 59.63%, and 68.94%, respectively, representing improvements of 2.34%, 3.71%, 4.75%, and 1.63% over the baseline model. These results demonstrate the effectiveness and superiority of the proposed method.

刘相滨, 朱游华, 彭峰. 可变形空洞卷积与三支注意力机制的手写数学表达式识别[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253486.

LIU Xiangbin, ZHU Youhua, PENG Feng. Handwritten Mathematical Expression Recognition With Deformable Dilated Convolution and Triplet Attention[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253486.

参考文献

[1] Deng Y, Kanervisto A, Ling J, et al. Image-to-markup generation with coarse-to-fine attention[C]//International Conference on Machine Learning. 2017: 980-989.
[2] Bian X, Qin B, Xin X, et al. Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2022: 113-121.
[3] Li B, Yuan Y, Liang D, et al. When counting meets HMER: counting-aware network for handwritten mathematical expression recognition[C]//European Conference on Computer Vision. 2022: 197-214.
[4] Zhao W, Gao L, Yan Z, et al. Handwritten mathematical expression recognition with bidirectionally trained transformer[C]//International Conference on Document Analysis and Recognition. 2021: 570-584.
[5] Zhao W, Gao L. Comer: Modeling coverage for transformer-based handwritten mathematical expression recognition[C]//European Conference on Computer Vision.2022:392-408.
[6] Zhu J, Gao L, Zhao W. ICAL: implicit character-aided learning for enhanced handwritten mathematical expression recognition[C]//International Conference on Document Analysis and Recognition. 2024: 21-37.
[7] Zhang J, Du J, Yang Y, et al. A tree-structured decoder for image-to-markup generation[C]//International Conference on Machine Learning. 2020: 11076-11085.
[8] Yuan Y, Liu X, Dikubab W, et al. Syntax-aware network for handwritten mathematical expression recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 4553-4562.
[9] 周伯瀚, 曹健, 王源. 基于Transformer模型的手写数学公式语法树解码器[J]. 北京大学学报(自然科学版), 2023, 59(6): 909-914. ZHOU B H, CAO J, WANG Y. A Transformer-based syntax tree decoder for handwritten mathematical expression recognition[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2023, 59(6): 909-914.
[10] 徐宽广, 何东隅, 韩冰, 等. 基于机器视觉的手写钢板号图像增强及矫正算法研究与应用[J]. 计算机工程, 2024, 50(4): 350-356. XU K G, HE D Y, HAN B, et al. Research and application of image enhancement and correction algorithm of handwritten steel plate numbering based on machine vision[J]. Computer Engineering, 2024, 50(4): 350-356.
[11] 杜永涛, 余元辉. 基于编码器-解码器的离线手写数学公式识别[J]. 集美大学学报(自然科学版), 2022, 27(6): 570-576. DU Y T, YU Y H. Offline handwritten mathematical expression recognition based on encoder-decoder[J]. Journal of Jimei University (Natural Science), 2022, 27(6): 570-576.
[12] 周名杰. 基于ResNet与Transformer的离线手写数学公式识别[J]. 科技创新与应用, 2022, 12(21): 18-21. ZHOU M J. Offline handwritten mathematical formula recognition based on ResNet and Transformer[J]. Technology Innovation and Application, 2022, 12(21): 18-21.
[13] 杜永涛. 基于编码器-解码器模型的离线手写数学公式识别技术研究[D]. 厦门: 集美大学, 2023. DU Y T. Research on offline handwritten mathematical formula recognition technology based on encoder-decoder model[D]. Xiamen: Jimei University, 2023.
[14] Zhu J, Zhao W, Li Y, et al. TAMER: tree-aware transformer for handwritten mathematical expression recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2025: 10950-10958.
[15] Miller E G, Viola P A. Ambiguity and constraint in mathematical expression recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 1998: 784-791.
[16] Yadav P, Shantilal S B, Kumar V, et al. A deep learning approach for recognizing and solving handwritten mathematical equations[J]. Neural Computing and Applications, 2025, 37(14): 8759-8772.
[17] Nguyen C T, Nguyen N T, Dao T H M, et al. Link prediction graph neural networks for structure recognition of handwritten mathematical expressions[C]//International Conference on Document Analysis and Recognition. 2025: 279-291.
[18] Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 764-773.
[19] Liu Z, Lin W, Li X, et al. ADNet: attention-guided deformable convolutional network for high dynamic range imaging[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 463-470.
[20] Yu H, Yun L, Chen Z, et al. A small object detection algorithm based on modulated deformable convolution and large kernel convolution[J]. Computational Intelligence and Neuroscience, 2023, 2023(1): 250-274.
[21] Chen H, Du Y, Fu Y, et al. DCAM-Net: a rapid detection network for strip steel surface defects based on deformable convolution and attention mechanism[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72(1): 1-12.
[22] Misra D, Nalamada T, Arasanipalai A U, et al. Rotate to attend: convolutional triplet attention module[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 3139-3148.
[23] Wang H H, Tsai F J, Lin Y Y, et al. TANet: triplet attention network for all-in-one adverse weather image restoration[C]//Proceedings of the Asian Conference on Computer Vision. 2024: 835-851.
[24] Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4700-4708.
[25] Mouchere H, Viard-Gaudin C, Zanibbi R, et al. ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions[C]//2014 14th International Conference on Frontiers in Handwriting Recognition. 2014: 791-796.
[26] Mouchere H, Viard-Gaudin C, Zanibbi R, et al. ICFHR2016 CROHME: competition on recognition of online handwritten mathematical expressions[C]//2016 15th International Conference on Frontiers in Handwriting Recognition. 2016: 607-612.
[27] Liu Z, Yuan Y, Ji Z, et al. Semantic graph representation learning for handwritten mathematical expression recognition[C]//International Conference on Document Analysis and Recognition. 2023: 152-166.
[28] Fu Y, Cai W, Gao M, et al. Symbol location-aware network for improving handwritten mathematical expression recognition[C]//Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. 2023: 516-524.
[29] Wen C, Yin L, Liu S. DGNet: a handwritten mathematical formula recognition network based on deformable convolution and global context attention[J]. Mobile Networks and Applications, 2024, 29(4): 1-14.
[30] Yu C Z, Wang R M, Xu H, et al. WHATSNet: a wavelet-guided hybrid attention token selector for handwritten mathematical expression recognition[J]. Journal of Electronic Imaging, 2025, 34(4): 31-43.
[31] Yang Z, Yu Y, Huang Y, et al. Innovative approaches in image processing: enhancing feature extraction and recognition capabilities[J]. The Visual Computer, 2025, 41(5): 1-15.

选择文件类型/文献管理软件名称

选择包含的内容