作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 37-45. doi: 10.19678/j.issn.1000-3428.0065762

• 热点与综述 • 上一篇    下一篇

基于矫正理解的中文文本对抗样本生成方法

王春东1,2, 孙嘉琪1,2, 杨文军1,2   

  1. 1. 天津理工大学 计算机科学与工程学院, 天津 300384;
    2. 计算机病毒防治技术国家工程实验室, 天津 300384
  • 收稿日期:2022-09-16 修回日期:2022-10-21 发布日期:2022-11-23
  • 作者简介:王春东(1969-),男,教授、博士生导师,主研方向为网络信息安全、普适计算;孙嘉琪,硕士研究生;杨文军,副教授。
  • 基金资助:
    国家自然科学基金联合基金项目(U1536122);国家重点研发计划“科技助力经济2020”重点专项(SQ2020YFF0413781);天津市科委重大专项(15ZXDSGX00030);天津市教委科研计划(2021YJSB252)。

Method for Generating Chinese Text Adversarial Examples Based on Rectification Understanding

WANG Chundong1,2, SUN Jiaqi1,2, YANG Wenjun1,2   

  1. 1. School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China;
    2. National Engineering Laboratory for Computer Virus Prevention and Control Technology, Tianjin 300384, China
  • Received:2022-09-16 Revised:2022-10-21 Published:2022-11-23

摘要: 自然语言处理技术在文本分类、文本纠错等任务中表现出强大性能,但容易受到对抗样本的影响,导致深度学习模型的分类准确性下降。防御对抗性攻击是对模型进行对抗性训练,然而对抗性训练需要大量高质量的对抗样本数据。针对目前中文对抗样本相对缺乏的现状,提出一种可探测黑盒的对抗样本生成方法WordIllusion。在数据处理与计算模块中,数据在删除标点符号后输入文本分类模型得到分类置信度,再将分类置信度输入CKSFM计算函数,通过计算比较cksf值选出句子中的关键词。在关键词替换模块中,利用字形嵌入空间和同音字库中的相似词语替换关键词并构建对抗样本候选序列,再将序列重新输入数据处理与计算模块计算cksf值,最终选择cksf值最高的数据作为最终生成的对抗样本。实验结果表明,WordIllusion方法生成的对抗样本在多数深度学习模型上的攻击成功率高于基线方法,在新闻分类场景的DPCNN模型上相比于CWordAttack方法最多高出41.73个百分点,且生成的对抗样本与原始文本相似度很高,具有较强的欺骗性与泛化性。

关键词: 深度神经网络, 自然语言处理, 文本分类, 对抗样本, 矫正理解

Abstract: Natural Language Processing(NLP) technology has shown a strong performance in text classification, text error correction, and other tasks.However, it is vulnerable to the impact of adversarial examples, resulting in the decline of the classification accuracy of deep learning models.An effective approach to defending against adversarial attacks is applying adversarial training on the model.However, adversarial training requires a large number of high-quality adversarial example data.Currently, adversarial examples for the Chinese have not been investigated extensively.This study proposes a detectable black-box method called WordIllusion, which can successfully generate adversarial examples.In the data processing and calculation module, first, the data is input into the text classification model after the punctuation is deleted to achieve classification confidence.Next, the classification confidence is input into the CKSFM calculation function, and the keywords in the sentence are selected by calculating and comparing the cksf value.In the keyword replacement module, the keywords are first replaced with similar words in the font-embedded space and homophone library, and a candidate sequence of adversarial samples is built.Subsequently, the sequence is input into the data processing and calculation module to calculate the cksf value.Finally, the data with the highest cksf value is selected as the final generated adversarial samples.The experimental results show that the Attack Success Rate(ASR) of the adversarial samples generated by the WordIllusion method on most deep learning models is higher than that of the baseline methods.For the Deep Pyramid Convolutional Neural Networks(DPCNN) model in the news classification scenario, the ASR of the WordIllusion method is 41.73 percentage points higher than that of the CWordAttack method at the most.In addition, the generated adversarial samples are similar to the original text, which exhibits strong deception and generalization.

Key words: deep neural network, Natural Language Processing(NLP), text classification, adversarial example, rectification understanding

中图分类号: