作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (8): 364-372. doi: 10.19678/j.issn.1000-3428.0069275

• 开发研究与工程应用 • 上一篇    下一篇

基于多阶门控聚合网络的光学化学结构识别

林帆, 李建华*()   

  1. 华东理工大学信息科学与工程学院,上海 200237
  • 收稿日期:2024-01-22 修回日期:2024-04-15 出版日期:2025-08-15 发布日期:2024-06-26
  • 通讯作者: 李建华
  • 基金资助:
    国家科技重大专项(2018ZX09735002)

Optical Chemical Structure Recognition Based on Multi-order Gated Aggregation Network

LIN Fan, LI Jianhua*()   

  1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2024-01-22 Revised:2024-04-15 Online:2025-08-15 Published:2024-06-26
  • Contact: LI Jianhua

摘要:

在光学化学结构识别(OCSR)领域,现有基于深度学习的模型通常依赖于卷积神经网络(CNN)或视觉Transformer进行视觉特征提取,并采用Transformer进行序列解码。这些模型虽然有效,但仍受限于图像特征提取能力和解码时位置编码的精确性,从而影响识别效率。针对这些限制,将多阶门控聚合网络(MogaNet)和引入相对位置编码的Transformer构成的编码解码架构用于OCSR领域,提出一种基于多阶门控聚合网络的光学化学结构识别模型。该模型首先在图像特征提取时通过MogaNet空间聚合模块,捕获多尺度特征并减少特征冗余,并且通过MogaNet通道聚合模块改善通道维度的多样性;其次在序列解码时采用引入相对位置编码的Transformer作为解码器,精准捕捉序列单词之间的相对位置关系。为了训练和验证该模型,构建一个包含40万个分子的化学结构数据集,其中包含Markush结构与非Markush结构。实验结果表明,该模型的准确率达到了92.36%,优于其他现有的模型。

关键词: 光学化学结构识别, 编码解码架构, 深度学习, SMILES表达式, 多阶门控聚合网络

Abstract:

In the field of Optical Chemical Structure Recognition (OCSR), current deep-learning-based models predominantly utilize Convolutional Neural Networks (CNNs) or Vision Transformers for visual feature extraction and Transformers for sequence decoding. Although these models are effective, they are still limited by their ability to extract image features and the accuracy of position encoding during decoding, which affect the recognition efficiency. In response to these limitations, this study uses an encoder-decoder architecture composed of a Multiorder gated aggregation Network (MogaNet) and a Transformer, which introduces relative positional encoding in the OCSR field, and proposes an optical chemical structure recognition model based on MogaNet. First, the model captures multiscale features, reduces feature redundancy using the MogaNet spatial aggregation module during image feature extraction, and improves channel dimension diversity using the MogaNet channel aggregation module. Second, during sequence decoding, a Transformer with relative positional encoding is used as the decoder to accurately capture the relative positional relationships between words. To train and validate this model, a chemical structure dataset containing 400 000 molecular structures is constructed, which includes both Markush and non-Markush structures. Experimental results demonstrate that the model achieves an accuracy of 92.36%, outperforming other models.

Key words: Optical Chemical Structure Recognition (OCSR), encoder-decoder architecture, deep learning, SMILES notation, Multi-order gated aggregation Network(MogaNet)