基于矫正理解的中文文本对抗样本生成方法

doi:10.19678/j.issn.1000-3428.0065762

摘要/Abstract

摘要： 自然语言处理技术在文本分类、文本纠错等任务中表现出强大性能，但容易受到对抗样本的影响，导致深度学习模型的分类准确性下降。防御对抗性攻击是对模型进行对抗性训练，然而对抗性训练需要大量高质量的对抗样本数据。针对目前中文对抗样本相对缺乏的现状，提出一种可探测黑盒的对抗样本生成方法WordIllusion。在数据处理与计算模块中，数据在删除标点符号后输入文本分类模型得到分类置信度，再将分类置信度输入CKSFM计算函数，通过计算比较cksf值选出句子中的关键词。在关键词替换模块中，利用字形嵌入空间和同音字库中的相似词语替换关键词并构建对抗样本候选序列，再将序列重新输入数据处理与计算模块计算cksf值，最终选择cksf值最高的数据作为最终生成的对抗样本。实验结果表明，WordIllusion方法生成的对抗样本在多数深度学习模型上的攻击成功率高于基线方法，在新闻分类场景的DPCNN模型上相比于CWordAttack方法最多高出41.73个百分点，且生成的对抗样本与原始文本相似度很高，具有较强的欺骗性与泛化性。

关键词: 深度神经网络, 自然语言处理, 文本分类, 对抗样本, 矫正理解

Abstract: Natural Language Processing(NLP) technology has shown a strong performance in text classification, text error correction, and other tasks.However, it is vulnerable to the impact of adversarial examples, resulting in the decline of the classification accuracy of deep learning models.An effective approach to defending against adversarial attacks is applying adversarial training on the model.However, adversarial training requires a large number of high-quality adversarial example data.Currently, adversarial examples for the Chinese have not been investigated extensively.This study proposes a detectable black-box method called WordIllusion, which can successfully generate adversarial examples.In the data processing and calculation module, first, the data is input into the text classification model after the punctuation is deleted to achieve classification confidence.Next, the classification confidence is input into the CKSFM calculation function, and the keywords in the sentence are selected by calculating and comparing the cksf value.In the keyword replacement module, the keywords are first replaced with similar words in the font-embedded space and homophone library, and a candidate sequence of adversarial samples is built.Subsequently, the sequence is input into the data processing and calculation module to calculate the cksf value.Finally, the data with the highest cksf value is selected as the final generated adversarial samples.The experimental results show that the Attack Success Rate(ASR) of the adversarial samples generated by the WordIllusion method on most deep learning models is higher than that of the baseline methods.For the Deep Pyramid Convolutional Neural Networks(DPCNN) model in the news classification scenario, the ASR of the WordIllusion method is 41.73 percentage points higher than that of the CWordAttack method at the most.In addition, the generated adversarial samples are similar to the original text, which exhibits strong deception and generalization.

Key words: deep neural network, Natural Language Processing(NLP), text classification, adversarial example, rectification understanding

中图分类号:

TP309

王春东, 孙嘉琪, 杨文军. 基于矫正理解的中文文本对抗样本生成方法[J]. 计算机工程, 2023, 49(2): 37-45.

WANG Chundong, SUN Jiaqi, YANG Wenjun. Method for Generating Chinese Text Adversarial Examples Based on Rectification Understanding[J]. Computer Engineering, 2023, 49(2): 37-45.

https://www.ecice06.com/CN/Y2023/V49/I2/37

图/表 12

20230216175951

20230216175954

20230216175957

20230216180000

20230216180004

20230216180007

20230216180010

20230216180013

20230216180016

20230216180019

20230216180023

20230216180026

参考文献

[1] 郑海斌, 陈晋音, 章燕, 等.面向自然语言处理的对抗攻防与鲁棒性分析综述[J].计算机研究与发展, 2021, 58(8):1727-1750. ZHENG H B, CHEN J Y, ZHANG Y, et al.Survey of adversarial attack, defense and robustness analysis for natural language processing[J].Journal of Computer Research and Development, 2021, 58(8):1727-1750.(in Chinese)
[2] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al.Intriguing properties of neural networks[EB/OL].[2022-08-11].https://arxiv.org/abs/1312.6199.
[3] PAPERNOT N, MCDANIEL P, SWAMI A, et al.Crafting adversarial input sequences for recurrent neural networks[C]//Proceedings of 2016 IEEE Military Communications Conference.Washington D.C., USA:IEEE Press, 2016:49-54.
[4] GOODFELLOW I J, SHLENS J, SZEGEDY C.Explaining and harnessing adversarial examples[EB/OL].[2022-08-11].https://arxiv.org/abs/1412.6572.
[5] 姜妍, 张立国.面向深度学习模型的对抗攻击与防御方法综述[J].计算机工程, 2021, 47(1):1-11. JIANG Y, ZHANG L G.Survey of adversarial attacks and defense methods for deep learning model[J].Computer Engineering, 2021, 47(1):1-11.(in Chinese)
[6] 马云霞.英汉语言表达上"替换与重复"的差异对比[J].海外英语, 2012(19):247-249. MA Y X.A contrastive study of the differences between "replacement and repetition" in English and Chinese language expression[J].Overseas English, 2012(19):247-249.(in Chinese)
[7] LIANG B, LI H C, SU M Q, et al.Deep text classification can be fooled[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence.Stockholm, Sweden:International Joint Conferences on Artificial Intelligence Organization, 2018:4208-4215.
[8] GAO J, LANCHANTIN J, SOFFA M L, et al.Black-box generation of adversarial text sequences to evade deep learning classifiers[C]//Proceedings of IEEE Security and Privacy Workshops.Washington D.C., USA:IEEE Press, 2018:50-56.
[9] LI J F, JI S L, DU T Y, et al.TextBugger:generating adversarial text against real-world applications[C]//Proceedings of 2019 Network and Distributed System Security Symposium.San Diego, USA:[s.n.], 2019:1-9.
[10] JIN D, JIN Z J, ZHOU J T, et al.Is BERT really robust?A strong baseline for natural language attack on text classification and entailment[C]//Proceedings of AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2020:8018-8025.
[11] ZHANG Z H, LIU M X, ZHANG C, et al.Argot:generating adversarial readable Chinese texts[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence.[S.l.]:International Joint Conferences on Artificial Intelligence Organization, 2020:2533-2539.
[12] 王文琦, 汪润, 王丽娜, 等.面向中文文本倾向性分类的对抗样本生成方法[J].软件学报, 2019, 30(8):2415-2427. WANG W Q, WANG R, WANG L N, et al.Adversarial examples generation approach for tendency classification on Chinese texts[J].Journal of Software, 2019, 30(8):2415-2427.(in Chinese)
[13] NUO C, CHANG G Q, GAO H C, et al.WordChange:adversarial examples generation approach for Chinese text classification[J].IEEE Access, 2020, 8:79561-79572.
[14] 仝鑫, 王罗娜, 王润正, 等.面向中文文本分类的词级对抗样本生成方法[J].信息网络安全, 2020, 20(9):12-16. TONG X, WANG L N, WANG R Z, et al.A generation method of word-level adversarial samples for Chinese text classification[J].Netinfo Security, 2020, 20(9):12-16.(in Chinese)
[15] COSTA D F, CARVALHO F, MOREIRA B C, et al.Bibliometric analysis on the association between behavioral finance and decision making with cognitive biases such as overconfidence, anchoring effect and confirmation bias[J].Scientometrics, 2017, 111(3):1775-1799.
[16] CHENG Y Y, ZHANG J, GONG X L, et al.Research on polymorphism and inertial reading application in text watermarking algorithm[C]//Proceedings of the 9th International Conference on Broadband and Wireless Computing, Communication and Applications.Washington D.C., USA:IEEE Press, 2014:89-95.
[17] SAMUELSON W, ZECKHAUSER R.Status quo bias in decision making[J].Journal of Risk and Uncertainty, 1988, 1(1):7-59.
[18] 郭可教, 杨奇志.汉字认知的"复脑效应"的实验研究[J].心理学报, 1995, 27(1), 78-83. GUO K J, YANG Q Z."Both-hemisphere effect" in the cognition of Chinese characters[J].ACTA Psychologica Sinica, 1995, 27(1):78-83.(in Chinese)
[19] SU T R, LEE H Y.Learning Chinese word representations from glyphs of characters[EB/OL].[2022-08-11].https://doi.org/10.48550/arXiv.1708.04755.
[20] LAI S W, XU L H, LIU K, et al.Recurrent convolutional neural networks for text classification[C]//Proceedings of AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2015:2267-2273.
[21] ELMAN J L.Finding structure in time[J].Cognitive Science, 1990, 14(2):179-211.
[22] HE J J, ZOU M, LIU P.Convolutional neural networks for Chinese sentiment classification of social network[C]//Proceedings of 2017 IEEE International Conference on Mechatronics and Automation(ICMA).Washington D.C., USA:IEEE Press, 2017:1-10.
[23] JOHNSON R, ZHANG T.Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2017:562-570.
[24] VASWANI A, SHAZEER N, PARMAR N K, et al.Attention is all you need[C]//Proceedings of the 31st Conference on Neural Information Processing Systems.Long Beach, USA:[s.n.], 2017:5998-6008.
[25] XU S L, ZHENG M F, LI X R.String comparators for Chinese-characters-based record linkages[J].IEEE Access, 2020, 9:3735-3743.
[26] 司逸晨, 管有庆.基于Transformer编码器的中文命名实体识别模型[J].计算机工程, 2022, 48(7):66-72. SI Y C, GUAN Y Q.Chinese named entity recognition model based on transformer encoder[J].Computer Engineering, 2022, 48(7):66-72.(in Chinese)

选择文件类型/文献管理软件名称

选择包含的内容