基于多阶门控聚合网络的光学化学结构识别

doi:10.19678/j.issn.1000-3428.0069275

摘要/Abstract

摘要：

在光学化学结构识别(OCSR)领域，现有基于深度学习的模型通常依赖于卷积神经网络(CNN)或视觉Transformer进行视觉特征提取，并采用Transformer进行序列解码。这些模型虽然有效，但仍受限于图像特征提取能力和解码时位置编码的精确性，从而影响识别效率。针对这些限制，将多阶门控聚合网络(MogaNet)和引入相对位置编码的Transformer构成的编码解码架构用于OCSR领域，提出一种基于多阶门控聚合网络的光学化学结构识别模型。该模型首先在图像特征提取时通过MogaNet空间聚合模块，捕获多尺度特征并减少特征冗余，并且通过MogaNet通道聚合模块改善通道维度的多样性；其次在序列解码时采用引入相对位置编码的Transformer作为解码器，精准捕捉序列单词之间的相对位置关系。为了训练和验证该模型，构建一个包含40万个分子的化学结构数据集，其中包含Markush结构与非Markush结构。实验结果表明，该模型的准确率达到了92.36%，优于其他现有的模型。

关键词: 光学化学结构识别, 编码解码架构, 深度学习, SMILES表达式, 多阶门控聚合网络

Abstract:

In the field of Optical Chemical Structure Recognition (OCSR), current deep-learning-based models predominantly utilize Convolutional Neural Networks (CNNs) or Vision Transformers for visual feature extraction and Transformers for sequence decoding. Although these models are effective, they are still limited by their ability to extract image features and the accuracy of position encoding during decoding, which affect the recognition efficiency. In response to these limitations, this study uses an encoder-decoder architecture composed of a Multiorder gated aggregation Network (MogaNet) and a Transformer, which introduces relative positional encoding in the OCSR field, and proposes an optical chemical structure recognition model based on MogaNet. First, the model captures multiscale features, reduces feature redundancy using the MogaNet spatial aggregation module during image feature extraction, and improves channel dimension diversity using the MogaNet channel aggregation module. Second, during sequence decoding, a Transformer with relative positional encoding is used as the decoder to accurately capture the relative positional relationships between words. To train and validate this model, a chemical structure dataset containing 400 000 molecular structures is constructed, which includes both Markush and non-Markush structures. Experimental results demonstrate that the model achieves an accuracy of 92.36%, outperforming other models.

Key words: Optical Chemical Structure Recognition (OCSR), encoder-decoder architecture, deep learning, SMILES notation, Multi-order gated aggregation Network(MogaNet)

林帆, 李建华. 基于多阶门控聚合网络的光学化学结构识别[J]. 计算机工程, 2025, 51(8): 364-372.

LIN Fan, LI Jianhua. Optical Chemical Structure Recognition Based on Multi-order Gated Aggregation Network[J]. Computer Engineering, 2025, 51(8): 364-372.

https://www.ecice06.com/CN/Y2025/V51/I8/364

图/表 12

图1 模型整体架构

Fig.1 Overall architecture of model

图2 MogaNet架构

Fig.2 Architecture of MogaNet

图3 空间聚合模块

Fig.3 Spatial aggregation module

图4 通道聚合模块

Fig.4 Channel aggregation module

图5 Transformer结构

Fig.5 Structure of Transformer

图6 数据集图像示例

Fig.6 Example of dataset image

图7 各模型在验证集上的准确率变化曲线

Fig.7 The accuracy variation curve of each model on the validation set

参考文献 31

1	CASEY R, BOYER S, HEALEY P, et al. Optical recognition of chemical graphics[C]//Proceedings of the 2nd International Conference on Document Analysis and Recognition. Washington D. C., USA: IEEE Press, 1993: 627-631.
2	IBISON P, JACQUOT M, KAM F, et al. Chemical literature data extraction: the CLiDE project. Journal of Chemical Information and Computer Sciences, 1993, 33(3): 338- 344. doi: 10.1021/ci00013a010
3	FILIPPOV I V, NICKLAUS M C. Optical structure recognition software to recover chemical information: OSRA, an open source solution. Journal of Chemical Information and Modeling, 2009, 49(3): 740- 743. doi: 10.1021/ci800067r
4	曹家乐, 李亚利, 孙汉卿, 等. 基于深度学习的视觉目标检测技术综述. 中国图象图形学报, 2022, 27(6): 1697- 1722.
	CAO J L, LI Y L, SUN H Q, et al. A survey on deep learning based visual object detection. Journal of Image and Graphics, 2022, 27(6): 1697- 1722.
5	王颖洁, 朱久祺, 汪祖民, 等. 自然语言处理在文本情感分析领域应用综述. 计算机应用, 2022, 42(4): 1011- 1020.
	WANG Y J, ZHU J Q, WANG Z M, et al. Review of applications of natural language processing in text sentiment analysis. Journal of Computer Applications, 2022, 42(4): 1011- 1020.
6	LI S, WANG Z, LIU Z, et al. MogaNet: multi-order gated aggregation network[C]//Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-35.
7	MERTSCHING B, HUND M, AZIZ Z. Attention is all you need[C]//Proceedings of Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 5998-6008.
8	WEININGER D. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 1988, 28(1): 31- 36. doi: 10.1021/ci00057a005
9	O'BOYLE N M, DALKE A. DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures[EB/OL]. [2023-12-20]. https://arxiv.org/abs/1802.04903v1
10	KRENN M, HÄSE F, NIGAM A, et al. Self-referencing embedded strings: a 100% robust molecular string representation. Machine Learning: Science and Technology, 2020, 1(4): 045024. doi: 10.1088/2632-2153/aba947
11	STAKER J, MARSHALL K, ABEL R, et al. Molecular structure extraction from documents using deep learning. Journal of Chemical Information and Modeling, 2019, 59(3): 1017- 1029. doi: 10.1021/acs.jcim.8b00669
12	KALCHBRENNER N, DANIHELKA I, GRAVES A. Grid long short-term memory[EB/OL]. [2023-12-20]. https://arxiv.org/abs/1507.01526.
13	CLEVERT D A, LE T, WINTER R, et al. Img2Mol-accurate SMILES recognition from molecular graphical depictions. Chemical Science, 2021, 12(42): 14174- 14181. doi: 10.1039/D1SC01839F
14	季秀怡, 李建华. 基于双路注意力机制的化学结构图像识别. 计算机工程, 2020, 46(9): 213- 220. doi: 10.19678/j.issn.1000-3428.0055881
	JI X Y, LI J H. Chemical structure image recognition based on dual attention mechanism. Computer Engineering, 2020, 46(9): 213- 220. doi: 10.19678/j.issn.1000-3428.0055881
15	KHOKHLOV I, KRASNOV L, FEDOROV M V, et al. Image2SMILES: transformer-based molecular optical recognition engine. Chemistry-Methods, 2022, 2(1): e202100069. doi: 10.1002/cmtd.202100069
16	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
17	VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 3156-3164.
18	RAJAN K, ZIELESNY A, STEINBECK C. DECIMER: towards deep learning for chemical image recognition. Journal of Cheminformatics, 2020, 12(1): 65. doi: 10.1186/s13321-020-00469-w
19	SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 2818-2826.
20	CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. [2023-12-20]. https://arxiv.org/abs/1412.3555.
21	RAJAN K, ZIELESNY A, STEINBECK C. DECIMER 1.0: deep learning for chemical image recognition using transformers. Journal of Cheminformatics, 2021, 13(1): 61. doi: 10.1186/s13321-021-00538-8
22	TAN M, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[C]//Proceedings of the 36th International Conference on Machine Learning. Cambridge, USA: MIT Press, 2019: 6105-6114.
23	RAJAN K, BRINKHAUS H O, AGEA M I, et al. DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications. Nature Communications, 2013, 14(1): 5045- 5062.
24	TAN M, LE Q. EfficientNetV2: smaller models and faster training[C]//Proceedings of the 38th International Conference on Machine Learning. Cambridge, USA: MIT Press, 2021: 10096-10106.
25	XU Z, LI J, YANG Z, et al. SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer. Journal of Cheminformatics, 2022, 14(1): 41. doi: 10.1186/s13321-022-00624-5
26	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 10012-10022.
27	QIAN Y, GUO J, TU Z, et al. MolScribe: robust molecular structure recognition with image-to-graph generation. Journal of Chemical Information and Modeling, 2023, 63(7): 1925- 1934. doi: 10.1021/acs.jcim.2c01480
28	RAJAN K, STEINBECK C, ZIELESNY A. Performance of chemical structure string representations for chemical image recognition using transformers. Digital Discovery, 2022, 1(2): 84- 90. doi: 10.1039/D1DD00013F
29	KIM S, CHEN J, CHENG T J, et al. PubChem in 2021: new data content and improved Web interfaces. Nucleic Acids Research, 2021, 49(1): 1388- 1395. URL
30	WILLIGHAGEN E L, MAYFIELD J W, ALVARSSON J, et al. The chemistry development kit v2.0: atom typing, depiction, molecular formulas, and substructure searching. Journal of Cheminformatics, 2017, 9(1): 33. doi: 10.1186/s13321-017-0220-4
31	BRINKHAUS H O, RAJAN K, ZIELESNY A, et al. RanDepict: random chemical structure depiction generator. Journal of Cheminformatics, 2022, 14(1): 31. doi: 10.1186/s13321-022-00609-4

[1]	武东辉, 王金凤, 仇森, 刘国志. 基于EWBiLSTM-ATT的数据手套手语识别[J]. 计算机工程, 2025, 51(8): 107-119.
[2]	郝宏达, 罗健旭. 基于多尺度区域特征融合的多器官语义分割模型[J]. 计算机工程, 2025, 51(8): 270-280.
[3]	武东辉, 王金凤, 仇森, 刘国志. 基于EWBiLSTM-ATT的数据手套手语识别[J]. 计算机工程, 2025, 51(8): 107-119.
[4]	沙宇洋, 陆京涛, 杜浩凡, 翟小兵, 孟维宇, 廉旭, 罗刚, 李克峰. 适用于导盲场景的多尺度特征融合轻量化道路图像分割算法[J]. 计算机工程, 2025, 51(7): 314-325.
[5]	余鹏, 杨佳琦, 陈欣然, 贺超波. 基于二部图对比学习的特征增强推荐算法[J]. 计算机工程, 2025, 51(7): 100-110.
[6]	孟波, 史旭华, 张彬. 基于双分支卷积和深度插值的点云表面重建[J]. 计算机工程, 2025, 51(7): 119-126.
[7]	周莎, 车生兵, 考友琛, 张旭, 郭甚驿. 基于特征选择和时空特征的网络入侵检测[J]. 计算机工程, 2025, 51(7): 223-231.
[8]	李姜辛, 王鹏, 汪卫. 多机理指导的深度学习工业时序预测框架[J]. 计算机工程, 2025, 51(7): 47-58.
[9]	周哲臣, 胡冀苏, 钱旭升, 郑毅, 戴亚康, 周志勇. 基于查询自适应双层自注意力机制的MRI脑组织分割[J]. 计算机工程, 2025, 51(7): 294-304.
[10]	欧阳昱中, 韩锐, 刘驰. 边缘侧领域自适应中长尾视觉识别技术研究[J]. 计算机工程, 2025, 51(7): 171-179.
[11]	庞鑫, 葛凤培, 李艳玲. 声景识音：数字化时代声学场景分类的探索与前沿[J]. 计算机工程, 2025, 51(6): 1-19.
[12]	陈思帆, 杨家志, 黄琳, 吕志玮, 沈露. 融合可变形核和自注意力的点云分类分割边卷积网络[J]. 计算机工程, 2025, 51(6): 146-154.
[13]	曹蓓, 赵奎. 基于双重情感和多特征融合的虚假新闻检测[J]. 计算机工程, 2025, 51(6): 193-203.
[14]	秦永旺, 张洋, 胡星, 刘胜, 李少青. 基于图注意力网络的门级网表功能识别[J]. 计算机工程, 2025, 51(6): 29-37.
[15]	廖丁丁, 刘俊峰, 曾君, 邱晓欢. 一种基于块平均正交权重修正的连续学习算法[J]. 计算机工程, 2025, 51(6): 57-64.

选择文件类型/文献管理软件名称

选择包含的内容