作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (1): 303-313. doi: 10.19678/j.issn.1000-3428.0069815

• 网络空间安全 • 上一篇    下一篇

基于字词重现的中文文本对抗样本生成方法

王春东1,2,*(), 赵智航1,2, 杨伟杰1,2, 方顺尧1,2   

  1. 1. 天津理工大学计算机科学与工程学院, 天津 300384
    2. 计算机病毒防治技术国家工程实验室, 天津 300384
  • 收稿日期:2024-05-06 修回日期:2024-07-15 出版日期:2026-01-15 发布日期:2026-01-15
  • 通讯作者: 王春东
  • 作者简介:

    王春东(CCF会员),男,教授、博士生导师,主研方向为网络信息安全、普适计算

    赵智航, 硕士研究生

    杨伟杰, 硕士研究生

    方顺尧, 硕士研究生

  • 基金资助:
    国家重点研发计划"区块链"重点专项(2023YFB2703903); 国家重点研发计划"科技助力经济2020"重点专项(SQ2020YFF0413781)

Adversarial Sample Generation Method for Chinese Text Based on Word Reproduction

WANG Chundong1,2,*(), ZHAO Zhihang1,2, YANG Weijie1,2, FANG Shunyao1,2   

  1. 1. School of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300384, China
    2. National Engineering Laboratory for Computer Virus Prevention and Control Technology, Tianjin 300384, China
  • Received:2024-05-06 Revised:2024-07-15 Online:2026-01-15 Published:2026-01-15
  • Contact: WANG Chundong

摘要:

随着海量数据的积累以及计算能力的不断提高, 深度神经网络(DNN)已广泛应用于图像识别、文本分类等各个领域。然而, 有研究表明, 基于DNN的文本分类模型经常受到攻击者恶意构造的对抗样本攻击, 攻击者可以通过删除和修改原始文本、插入混淆语句以及加入标点符号等方式使得模型分类结果发生改变。现有的对抗样本生成方法大多以牺牲隐蔽性为代价, 采用多类型替换池杂糅的方式来提高攻击准确率, 无法兼顾攻击成功率和对抗样本的隐蔽性。为解决此问题, 提出一种针对对抗样本隐蔽性进行设计的中文对抗样本生成方法WordReproduction, 通过汉字本身词性并结合字词级维度来计算汉字的显著性得分, 在关键字词替换模块中利用形近字向量空间、字形拆分候选池和词语倒置3种字形替换方法, 分别对关键字和词进行替换, 并根据汉字的形态特征设计字形相似度评估算法, 更好地量化对抗样本与原文之间的相似程度。实验结果表明, WordReproduction方法生成的对抗样本在攻击成功率和字形相似度上均优于基线方法, 在情感分类场景的Transformer模型上, 相比WordHandling方法, WordReproduction的攻击成功率和字形相似度得分分别提高了51.64百分点和0.53, 其生成的对抗样本不仅能够误导模型的分类结果, 而且具有较高的隐蔽性, 使得人类阅读者很难察觉。

关键词: 深度神经网络, 文本分类, 对抗样本生成, 字形替换, 相似度评估

Abstract:

With the accumulation of massive amounts of data and continuous improvements in computing power, Deep Neural Networks (DNN) have been widely used in various tasks such as image recognition and text classification. However, studies have shown that DNN-based text classification models are often subjected to adversarial sample attacks that are maliciously constructed by attackers. Attackers can alter the classification results of a model by deleting or modifying the original text, inserting obfuscated statements, or adding punctuation marks. Most existing adversarial sample generation methods sacrifice concealment and adopt a hybrid approach involving a variety of replacement pools to improve attack accuracy, which cannot balance the attack success rate and the concealment of adversarial samples. To solve this problem, this study proposes a Chinese adversarial sample generation method called WordReproduction, which is designed to conceal adversarial samples. The saliency score of the Chinese characters is calculated by combining the parts-of-speech of the characters themselves with the word level dimension. In the keyword replacement module, three glyph replacement methods are used to replace keywords and words: near-word vector space, glyph splitting candidate pool, and word inversion. Based on the morphological characteristics of Chinese characters, the study also designs a glyph similarity evaluation algorithm to better quantify the similarity between adversarial samples and the original text. Experimental results show that the adversarial samples generated by WordReproduction are superior to those generated by the baseline method in terms of the attack success rate and glyph similarity. When using the Transformer model for sentiment classification, compared with the WordHandling method, the attack success rate and glyph similarity score of WordReproduction increase by 51.64 percentage points and 0.53, respectively. The generated adversarial samples not only mislead the classification results of the model but also have high concealment, making them difficult for human readers to detect.

Key words: Deep Neural Network (DNN), text classification, adversarial sample generation, glyph replacement, similarity assessment