基于双路注意力机制的化学结构图像识别

doi:10.19678/j.issn.1000-3428.0055881

摘要/Abstract

摘要： 基于传统图像处理技术与流水线方式的化学结构图像识别方法通常依赖于人工设计的特征，导致识别准确率较低。针对该问题，提出一种基于空间注意力机制与通道注意力机制的化学结构图像识别方法。将化学结构识别视为序列生成任务，采用卷积神经网络（CNN）与长短期记忆（LSTM）网络相结合的深度神经网络模型实现化学结构图像到SMILES序列的转换。该深度神经网络模型由编码和解码两部分组成，编码部分使用CNN提取化学结构图像特征，解码部分融合双路注意力机制与LSTM网络生成SMILES序列。实验结果表明，该方法在Beam Size为3的情况下，识别准确率和BLEU-4值分别为81.63%和0.937，明显优于无注意力机制和单注意力机制的化学结构图像识别方法。

关键词: 化学结构图像识别, 卷积神经网络, 长短期记忆网络, 双路注意力机制, 序列生成

Abstract: Most of the existing chemical structure image recognition methods based on traditional image processing techniques and pipeline methods usually rely on artificially designed features,resulting in a low recognition accuracy.To solve the problem,this paper proposes a chemical structure image recognition method based on spatial attention mechanism and channel attention mechanism.The method simplifies the recognition of chemical structure to a sequence generation task,and adopts a deep neural network model combining Convolutional Neural Network(CNN) and Long Short-Term Memory(LSTM) network to implement the transformation from chemical structure images to the SMILES sequence.The deep neural network model is composed of the encoder and the decoder.The encoder uses CNN to extract features of chemical structure images,and the decoder combines the two attention mechanisms with LSTM to generate SMILES sequences.Experimental results show that the proposed method improves the recognition accuracy to 81.63% and the BLEU-4 value to 0.937 under the condition that Beam Size equals 3,outperforming the chemical structure image recognition methods without attention mechanism or with a single attention mechanism.

Key words: chemical structure image recognition, Convolutional Neural Network(CNN), Long Short-Term Memory(LSTM) network, dual attention mechanism, sequence generation

中图分类号:

TP391

季秀怡, 李建华. 基于双路注意力机制的化学结构图像识别[J]. 计算机工程, 2020, 46(9): 213-220.

JI Xiuyi, LI Jianhua. Chemical Structure Image Recognition Based on Dual Attention Mechanism[J]. Computer Engineering, 2020, 46(9): 213-220.

http://www.ecice06.com/CN/Y2020/V46/I9/213

图/表 10

20200915141431

20200915141435

20200915141438

20200915141441

20200915141445

20200915141449

20200915141452

20200915141454

20200915141457

20200915141501

参考文献

[1] SADAWI N M,SEXTON A P,SORGE V.Chemical structure recognition:a rule-based approach[C]//Proceedings of Document Recognition and Retrieval Conference.Burlingame,USA:SPIE,2012:1-8.
[2] IBISON P,JACQUOT M,KAM F,et al.Chemical literature data extraction:the CLiDE project[J].Journal of Chemical Information and Modeling,1993,33(3):338-344.
[3] VALKO A T,JOHNSON A P.CLiDE pro:the latest generation of CLiDE,a tool for optical chemical structure recognition[J].Journal of Chemical Information and Modeling,2009,49(4):780-787.
[4] PARK J,ROSANIA G R,SHEDDEN K A,et al.Automated extraction of chemical structure information from digital raster images[J].Chemistry Central Journal,2009,3(1):4.
[5] FILIPPOV I V,NICKLAUS M C.Optical structure recognition software to recover chemical information:OSRA,an open source solution[J].Journal of Chemical Information and Modeling,2009,49(3):740-743.
[6] FRASCONI P,GABBRIELLI F,LIPPI M,et al.Markov logic networks for optical chemical structure recognition[J].Journal of Chemical Information and Modeling,2014,54(8):2380-2390.
[7] STAKER J,MARSHALL K,ABEL R,et al.Molecular structure extraction from documents using deep learning[J].Journal of Chemical Information and Modeling,2019,59(3):1017-1029.
[8] SUTSKEVER I,VINYALS O,LE Q.Sequence to sequence learning with neural networks[EB/OL].[2019-08-11].https://arxiv.org/abs/1409.3215.
[9] PAULUS R,XIONG C M,SOCHER R.A deep reinforced model for abstractive summarization[EB/OL].[2019-08-11].https://arxiv.org/abs/1705.04304.
[10] NEMA P,KHAPRA M,LAHA A,et al.Diversity driven attention model for query-based abstractive summarization[EB/OL].[2019-08-11].https://arxiv.org/abs/1704.08300.
[11] WU Y H,SCHUSTER M,CHEN Z F,et al.Google's neural machine translation system:bridging the gap between human and machine translation[EB/OL].[2019-08-11].https://arxiv.org/abs/1609.08144.
[12] GEHRING J,AULI M,GRANGIER D,et al.A convolutional encoder model for neural machine translation[EB/OL].[2019-08-11].https://arxiv.org/abs/1611.02344.
[13] KARPATHY A,LI F F.Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:3128-3137.
[14] ANDERSON P,HE X D,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:6077-6086.
[15] CHUNG J,GULCEHRE C,CHO K,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].[2019-08-11].https://arxiv.org/abs/1412.3555.
[16] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[17] TANG Pengjie,WANG Hanli,XU Kaisheng.Multi-objective layer-wise optimization and multi-level probability fusion for image description generation using LSTM[J].Acta Automatica Sinica,2018,44(7):1237-1249.(in Chinese)汤鹏杰,王瀚漓,许恺晟.LSTM逐层多目标优化及多层概率融合的图像描述[J].自动化学报,2018,44(7):1237-1249.
[18] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].[2019-08-11].https://arxiv.org/abs/1409.0473.
[19] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2017:5998-6008.
[20] LIU Wanjun,LIANG Xuejian,QU Haicheng.Learning performance of convolutional neural networks with different pooling models[J].Journal of Image and Graphics,2016,21(9):1178-1190.(in Chinese)刘万军,梁雪剑,曲海成.不同池化模型的卷积神经网络学习性能研究[J].中国图象图形学报,2016,21(9):1178-1190.
[21] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-08-11].https://arxiv.org/abs/1409.1556.
[22] XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1492-1500.
[23] LIANG Bin,LIU Quan,XU Jin,et al.Aspect-based sentiment analysis based on multi-attention CNN[J].Journal of Computer Research and Development,2017,54(8):1724-1735.(in Chinese)梁斌,刘全,徐进,等.基于多注意力卷积神经网络的特定目标情感分析[J].计算机研究与发展,2017,54(8):1724-1735.
[24] FU Jun,LIU Jing,TIAN Haijie,et al.Dual attention network for scene segmentation[EB/OL].[2019-08-11].https://arxiv.org/abs/1809.02983.
[25] WANG Huijian,LIU Zheng,LI Yun,et al.Trend prediction method of time series trends based on neural network language model[J].Computer Engineering,2019,45(7):13-19,25.(in Chinese)王慧健,刘峥,李云,等.基于神经网络语言模型的时间序列趋势预测方法[J].计算机工程,2019,45(7):13-19,25.
[26] HU J,SHEN L,ALBANIE S,et al.Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:7132-7141.
[27] XU K,BA J,KIROS R,et al.Show,attend and tell:neural image caption generation with visual attention[EB/OL].[2019-08-11].https://arxiv.org/abs/1502.03044.
[28] KIM S,THIESSEN P A,BOLTON E E,et al.PubChem substance and compound databases[J].Nucleic Acids Research,2016,44(1):1202-1213.
[29] LANDRUM G.The Rdkit documentation[EB/OL].[2019-08-11].http://www.rdkit.org/docs/index.html.
[30] PAPINENI K,ROUKOS S,WARD T,et al.BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Philadelphia,USA:ACL,2002:311-318.

选择文件类型/文献管理软件名称

选择包含的内容