Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (8): 131-140. doi: 10.19678/j.issn.1000-3428.0069348

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Chinese Adversarial Examples Generation Based on Adaptive Beam Search Algorithm

XIA Niming, ZHANG Jie*()   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, Jiangsu, China
  • Received:2024-02-04 Revised:2024-03-06 Online:2025-08-15 Published:2024-05-17
  • Contact: ZHANG Jie

基于自适应集束搜索算法的中文对抗样本生成

夏倪明, 张洁*()   

  1. 南京邮电大学计算机学院、软件学院、网络空间安全学院, 江苏 南京 210023
  • 通讯作者: 张洁
  • 基金资助:
    国家自然科学基金青年基金项目(61902195); 国家重点研发计划(2018YFB1500902)

Abstract:

Deep Neural Network (DNN) are extremely vulnerable to adversarial examples, where subtle perturbations to legitimate inputs may cause the model to yield erroneous outputs. Exploring adversarial attacks can promote the robustness of deep learning models and advance the interpretability of DNN. Existing methods for generating Chinese adversarial examples typically employ simple transformation strategies, with emphasis on isolated Chinese linguistic features without considering the contextual effect of attacks. Hence, a heuristic-based algorithm known as the BSCA is proposed in this study. By comprehensively analyzing the linguistic variations and incorporating prior knowledge of Chinese character formation, phonics, and formality, a strategy for accurately assessing Chinese character deviations is designed. The adversarial search space is constructed based on this deviation strategy, and an improved beam search algorithm is utilized to optimize the generation process of Chinese adversarial examples in black-box attacks. Under strict constraints on perturbance and semantic deviation, BSCA can automatically adapt to different scenario requirements. Experimental evaluations conducted on TextCNN, TextRNN, and Bidirectional Encoder Representations from Transformers (BERT) for two Natural Language Processing (NLP) tasks indicate that BSCA can reduce the classification accuracy by at least 63.84 percentage points while incurring lower attack costs compared with baseline methods.

Key words: adversarial examples, Chinese feature, black-box attack, beam search, text classification

摘要:

深度神经网络(DNN)极易受到对抗样本的影响, 仅需向原始文本中添加细微的扰动即可诱导目标模型做出误判。研究对抗样本的生成不仅有利于提升模型的鲁棒性, 还能推动DNN可解释性方面的工作。在中文对抗领域, 现有的中文对抗样本生成方法大多采用单一变换策略, 仅考虑了部分汉语特征, 并且忽视了攻击对上下文语境产生的影响。为了解决这些问题, 提出一种基于启发式算法的中文对抗样本生成方法BSCA。通过全面分析表音文字和意音文字之间的差异, 结合汉语的构字法、字音、字形、认知语言学等先验知识, 设计可准确评估汉字差异的中文文本扰动策略。利用扰动策略构建对抗搜索空间, 并运用改进的集束搜索算法对黑盒攻击过程进行优化。在严格限制扰动大小和语义偏移的情况下, BSCA能够自动选择不同的攻击策略, 以适应不同场景需求。在多个自然语言处理(NLP)任务上分别对TextCNN、TextRNN和BERT(Bidirectional Encoder Representations from Transformers)模型进行实验, 结果表明, BSCA具有较好的泛化能力, 能使分类准确率至少降低63.84百分点, 同时拥有比基线方法更低的攻击代价。

关键词: 对抗样本, 中文特征, 黑盒攻击, 集束搜索, 文本分类