作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 54-62. doi: 10.19678/j.issn.1000-3428.0066416

• 人工智能与模式识别 • 上一篇    下一篇

基于原型网络的中文分类模型对抗样本生成

杨燕燕1, 谢明轩2, 曹江峡2,3, 王学宾2, 柳厅文2,3, 杜彦辉1   

  1. 1. 中国人民公安大学 信息网络安全学院, 北京 100038
    2. 中国科学院信息工程研究所, 北京 100084
    3. 中国科学院大学 网络空间安全学院, 北京 100049
  • 收稿日期:2022-12-01 出版日期:2023-08-15 发布日期:2023-08-15
  • 作者简介:

    杨燕燕(1986—),女,硕士研究生,主研方向为网络安全

    谢明轩,硕士

    曹江峡,博士研究生

    王学宾,博士

    柳厅文,研究员、博士、博士生导师

    杜彦辉,教授、博士、博士生导师

  • 基金资助:
    国家重点研发计划(2021YFB3100600); 中国科学院战略性先导科技专项(XDC02040400); 中国科学院青年创新促进会项目(2021153)

Adversarial Sample Generation for Chinese Classification Model Based on Prototypical Network

Yanyan YANG1, Mingxuan XIE2, Jiangxia CAO2,3, Xuebin WANG2, Tingwen LIU2,3, Yanhui DU1   

  1. 1. College of Information and Cyber Security, People's Public Security University of China, Beijing 100038, China
    2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100084, China
    3. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-12-01 Online:2023-08-15 Published:2023-08-15

摘要:

对抗样本生成通过在原文本中添加不易察觉的扰动使深度学习模型产生错误输出,常用于检测文本分类模型的鲁棒性。现有对抗样本生成方法多数采用黑盒或白盒攻击,在生成对抗样本的过程中需要和受害模型交互,且攻击效果依赖于受害模型的结构和性能,通用性较差。面向中文文本的对抗样本生成方法使用的变换策略过于单一,无法生成多样化的中文对抗样本。针对这些问题,提出一种基于原型网络的对抗样本生成(AEGP)方法。在全面分析汉字结构特点和人类阅读习惯的基础上,设计8种可保持语义一致的中文文本变换策略。将卷积神经网络作为编码器,构建原型网络,利用同一类别下的其他文本辅助发现所需变换的文本片段。针对选择的文本片段应用文本变换策略,生成对抗样本。实验结果表明,AEGP方法具有较好的通用性,能生成多样化的对抗样本,且相比于基线方法,分类模型在AEGP方法生成的对抗样本上的准确率下降了9.21~32.64个百分点。

关键词: 对抗样本生成, 分类模型, 原型网络, 文本表示, 变换策略

Abstract:

In adversarial sample generation, the deep learning model is triggered to add imperceptible perturbations to the original text, thereby producing an incorrect output which can subsequently be used to test the robustness of the text classification model against malicious attacks. Existing adversarial sample generation methods must interact with the victim model in launching mostly black- or white-box attacks. The effect of the attack depends on the attributes of the victim model, such as structure and performance, and thus the process is not sufficiently versatile. In addition, the transformation strategy used in the adversarial sample generation method for Chinese text is too simple to generate diverse adversarial examples. To address these issues, in this study, an adversarial sample generation method called AEGP is proposed for a Chinese text classification model. First, based on a comprehensive analysis of the structural characteristics of Chinese characters and human reading habits, eight Chinese text transformation strategies are designed to maintain consistent semantics. Subsequently, using convolutional neural networks as the encoder, a prototypical network is built, whereby other texts in the same category are used to determine the text fragments that need to be transformed. Finally, text transformation strategies are applied to the selected text fragments to generate adversarial samples. The experimental results demonstrate that AEGP has good generality in generating diverse adversarial samples. Compared with the baseline method, the accuracy of the classification model on the adversarial samples generated by AEGP dropped by 9.21-32.64 percentage points, demonstrating the sensitivity of the model to imperceptible perturbations.

Key words: adversarial sample generation, classification model, prototypical network, text representation, transformation strategy