Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

A Multi-level Chinese Adversarial Example Generation Method Based on Glyph and Semantic

  

  • Published:2025-04-14

基于字形和语义的多级中文对抗样本生成方法

Abstract: Deep neural network language models are vulnerable to adversarial attacks during application, where adversarial samples can be generated by adding small perturbations to original samples to mislead models into making incorrect decisions. Research on adversarial sample generation methods effectively reveals and evaluates model robustness deficiencies. Existing Chinese adversarial sample generation methods mostly focus on improving attack success rates while neglecting quality metrics like sample stealthiness. This research focuses on Chinese text adversarial sample generation techniques and proposes CMSPSO, a multi-level adversarial sample generation method that combines Chinese character glyph and semantic information, considering the unique characteristics of Chinese characters in glyph structure and semantic features. CMSPSO uses particle swarm optimization algorithms to search for suitable replacement combinations in pre-designed replacement knowledge bases to generate adversarial samples. CMSPSO-M combines visually similar multi-language character features and constructs a high-quality visual replacement character knowledge base through trained siamese neural networks to calculate visual similarity for character-level adversarial sample generation. CMSPSO-S builds semantic replacement word knowledge bases based on HowNet and WordNet to generate word-level adversarial samples, evaluated through attack effectiveness and attack cost metrics. Experimental results demonstrate that CMSPSO exhibits significant attack effectiveness across multiple models and datasets. In particular, CMSPSO-M achieves an attack success rate of 84.22% against the Roberta model on the XNLI dataset. Furthermore, CMSPSO shows clear advantages in attack cost metrics and outperforms baseline methods in overall performance.

摘要: 深度神经网络语言模型在应用过程中容易受到对抗样本攻击,通过在原始样本中添加微小扰动,生成对抗样本,以误导模型做出错误决策。通过研究对抗样本的生成方法,可以有效发现并评估模型的鲁棒性缺陷。现有的中文对抗样本生成方法大多关注于如何提升对抗样本的攻击成功率,而忽略了对抗样本的隐蔽性等质量指标。该研究关注中文文本对抗样本生成技术,结合中文在字形结构和语义特征方面的独特性,提出了一种结合汉字字形和语义信息的多级对抗样本生成方法CMSPSO。CMSPSO通过利用粒子群优化算法在预先设计的替换知识库中搜索合适的替换组合生成对抗样本。其中,CMSPSO-M结合汉字的多语言形近字特征,通过训练的孪生神经网络计算形近字相似度,构建了高质量视觉替换字知识库以生成字符级对抗样本;CMSPSO-S则基于HowNet和WordNet构建语义替换词知识库生成词级对抗样本,并通过攻击效果和攻击代价指标进行评估。实验结果表明,CMSPSO在多个模型和数据集上均展现出了显著的攻击效果,尤其在XNLI数据集上,CMSPSO-M对Roberta模型的攻击成功率达到了84.22%。此外,在攻击代价指标方面,CMSPSO也表现出明显优势,整体性能优于基线方法。