面向中文的多层次扰动定位文本对抗样本生成方法

doi:10.19678/j.issn.1000-3428.0069837

摘要/Abstract

摘要：

为提升中文领域黑盒攻击下生成对抗样本过程中扰动定位精度, 并解决现有方法在词重要度评估中忽视上下文关联度和语义密度的问题, 提出一种具有多层次扰动定位能力的中文文本对抗样本生成方法(MDLM)。首先, 通过整合多源异构深度学习模型, 构建一套融合不同特征提取能力的多层次判定模型; 其次, 在词重要度评估上新增3种评估函数, 从多个维度评估词的重要度; 最后, 通过多层次判定模型与评估函数共同作用实现对原始文本扰动点的精准定位。在文本对抗样本生成策略上, MDLM融合了繁体字、拼音、多音词、同音词等多种文本替换策略, 旨在确保攻击成功率的同时, 提升生成对抗样本的多样性。实验结果显示, MDLM在多个数据集上针对多个目标模型进行攻击时扰动效果显著, 最高攻击扰动率达到了43.5%, 进一步增强了对抗样本的攻击能力。同时, 针对多层次扰动定位能力的消融实验结果显示, 将评估函数与判定模型进行多层次组合可以显著提高生成对抗样本的攻击效果。

关键词: 黑盒攻击, 扰动定位, 判定模型, 词重要度评估, 对抗样本生成

Abstract:

This study attempts to improve the accuracy of perturbation localization when generating countermeasure samples under a black-box attack in the Chinese field and to solve the problem of existing methods ignoring context relevance and semantic density when evaluating word importance. This study proposes a text adversarial sample generation method for Chinese language with multi-level perturbation localization (MDLM). First, a set of multi-level decision models is constructed integrating different feature extraction capabilities by organically combining multi-source heterogeneous deep learning models. Then, three new evaluation functions are added to evaluate the importance of words from multiple dimensions. Finally, the multi-level decision model and the evaluation function work together to accurately position the original text disturbance points. In terms of the text countermeasure sample generation strategy, MDLM integrates a variety of text replacement strategies, such as traditional Chinese characters, Pinyin, polyphonic words, and homonyms, aiming to ensure the success rate of attacks and improve the diversity of generated countermeasure samples. Experimental results show that when MDLM attacks multiple target models on multiple datasets, its disturbance effect is significant, and the maximum attack disturbance rate reaches 43.5%, which further enhances the attack ability against samples. Simultaneously, results of ablation experiments conducted to evaluate the multi-level perturbation localization ability show that the multi-level combination of the scoring function and decision model can significantly improve the attack effect of generating countermeasure samples.

Key words: black-box attack, perturbation localization, decision model, word importance assessment, adversarial sample generation

侯彦, 车蕾, 李慧. 面向中文的多层次扰动定位文本对抗样本生成方法[J]. 计算机工程, 2025, 51(7): 232-243.

HOU Yan, CHE Lei, LI Hui. Text Adversarial Sample Generation Method for Chinese Language with Multi-level Perturbation Localization[J]. Computer Engineering, 2025, 51(7): 232-243.

https://www.ecice06.com/CN/Y2025/V51/I7/232

图/表 13

图1 MDLM系统结构

Fig.1 MDLM system architecture

图2 判定模型结构

Fig.2 Decision model structure

图3 THUCNews数据集对抗样本攻击结果

Fig.3 Results of adversarial sample attacks on THUCNews dataset

图4 酒店评论数据集对抗样本攻击结果

Fig.4 Results of adversarial sample attacks on hotel review dataset

图5 外卖评论数据集对抗样本攻击结果

Fig.5 Results of adversarial sample attacks on takeout comment dataset

图6 THUCNews数据集消融实验结果

Fig.6 Ablation experiment results on THUCNews dataset

参考文献 25

1	TAIGMAN Y, YANG M, RANZATO M, et al. DeepFace: closing the gap to human-level performance in face verification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2014: 1701-1708.
2	DAHL G E , YU D , DENG L , et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 20 (1): 30- 42.
3	KIM Y, JERNITE Y, SONTAG D, et al. Character-aware neural language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York, USA: ACM Press, 2016: 2741-2749.
4	窦慧, 张凌茗, 韩峰, 等. 卷积神经网络的可解释性研究综述. 软件学报, 2024, 35 (1): 159- 184.
	DOU H , ZHANG L M , HAN F , et al. Survey on convolutional neural network interpretability. Journal of Software, 2024, 35 (1): 159- 184.
5	XIAO Q X, LI K, ZHANG D Y, et al. Security risks in deep learning implementations[C]//Proceedings of the IEEE Security and Privacy Workshops. Washington D. C., USA: IEEE Press, 2018: 123-128.
6	BEHJATI M, MOOSAVI-DEZFOOLI S M, BAGHSHAH M S, et al. Universal adversarial attacks on text classifiers[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2019: 7345-7349.
7	PAPERNOT N, MCDANIEL P, SWAMI A, et al. Crafting adversarial input sequences for recurrent neural networks[C]//Proceedings of the 2016 IEEE Military Communications Conference. Washington D. C., USA: IEEE Press, 2016: 49-54.
8	SRBA I , BIELIKOVA M . A comprehensive survey and classification of approaches for community question answering. ACM Transactions on the Web, 2016, 10 (3): 1- 63.
9	LI J F, JI S L, DU T Y, et al. TextBugger: generating adversarial text against real-world applications[EB/OL]. [2024-05-10]. https://arxiv.org/abs/1812.05271v1.
10	杜小虎, 吴宏明, 易子博, 等. 文本对抗样本攻击与防御技术综述. 中文信息学报, 2021, 35 (8): 1- 15. doi: 10.3969/j.issn.1003-0077.2021.08.001
	DU X H , WU H M , YI Z B , et al. Adversarial text attack and defense: a review. Journal of Chinese Information Processing, 2021, 35 (8): 1- 15. doi: 10.3969/j.issn.1003-0077.2021.08.001
11	杨燕燕, 谢明轩, 曹江峡, 等. 基于原型网络的中文分类模型对抗样本生成. 计算机工程, 2023, 49 (8): 54- 62. doi: 10.19678/j.issn.1000-3428.0066416
	YANG Y Y , XIE M X , CAO J X , et al. Adversarial sample generation for Chinese classification model based on prototypical network. Computer Engineering, 2023, 49 (8): 54- 62. doi: 10.19678/j.issn.1000-3428.0066416
12	王春东, 孙嘉琪, 杨文军. 基于矫正理解的中文文本对抗样本生成方法. 计算机工程, 2023, 49 (2): 37- 45. doi: 10.19678/j.issn.1000-3428.0065762
	WANG C D , SUN J Q , YANG W J . Method for generating Chinese text adversarial examples based on rectification understanding. Computer Engineering, 2023, 49 (2): 37- 45. doi: 10.19678/j.issn.1000-3428.0065762
13	SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[EB/OL]. [2024-05-10]. https://arxiv.org/abs/1312.6199v4.
14	JIA R, LIANG P. Adversarial examples for evaluating reading comprehension systems[EB/OL]. [2024-05-10]. https://arxiv.org/abs/1707.07328v1.
15	EBRAHIMI J, RAO A Y, LOWD D, et al. HotFlip: white-box adversarial examples for text classification[EB/OL]. [2024-05-10]. https://arxiv.org/abs/1712.06751v2.
16	GARG S, RAMAKRISHNAN G. BAE: BERT-based adversarial examples for text classification[EB/OL]. [2024-05-10]. https://arxiv.org/abs/2004.01970v3.
17	LI L Y, MA R T, GUO Q P, et al. BERT-ATTACK: adversarial attack against BERT using BERT[EB/OL]. [2024-05-10]. https://arxiv.org/abs/2004.09984v3.
18	LI D Q, ZHANG Y Z, PENG H, et al. Contextualized perturbation for textual adversarial attack[EB/OL]. [2024-05-10]. https://arxiv.org/abs/2009.07502v2.
19	王文琦, 汪润, 王丽娜, 等. 面向中文文本倾向性分类的对抗样本生成方法. 软件学报, 2019, 30 (8): 2415- 2427.
	WANG W Q , WANG R , WANG L N , et al. Adversarial examples generation approach for tendency classification on Chinese texts. Journal of Software, 2019, 30 (8): 2415- 2427.
20	GAO J, LANCHANTIN J, SOFFA M L, et al. Black-box generation of adversarial text sequences to evade deep learning classifiers[C]//Proceedings of the IEEE Security and Privacy Workshops. Washington D. C., USA: IEEE Press, 2018: 50-56.
21	仝鑫, 王罗娜, 王润正, 等. 面向中文文本分类的词级对抗样本生成方法. 信息网络安全, 2020, 20 (9): 12- 16. doi: 10.3969/j.issn.1671-1122.2020.09.003
	TONG X , WANG L N , WANG R Z , et al. A generation method of word-level adversarial samples for Chinese text classification. Netinfo Security, 2020, 20 (9): 12- 16. doi: 10.3969/j.issn.1671-1122.2020.09.003
22	叶文滔, 张敏, 陈仪香. 基于义原级语句稀释法的文本对抗攻击能力强化方法. 软件学报, 2023, 34 (7): 3313- 3328.
	YE W T , ZHANG M , CHEN Y X . Enhancement of textual adversarial attack ability based on sememe-level sentence dilution algorithm. Journal of Software, 2023, 34 (7): 3313- 3328.
23	韩子屹, 王巍, 玄世昌. 多约束引导的中文对抗样本生成. 中文信息学报, 2023, 37 (2): 41- 52. doi: 10.3969/j.issn.1003-0077.2023.02.004
	HAN Z Y , WANG W , XUAN S C . Chinese adversarial example generation guided by multi-constraints. Journal of Chinese Information Processing, 2023, 37 (2): 41- 52. doi: 10.3969/j.issn.1003-0077.2023.02.004
24	李相葛, 罗红, 孙岩. 基于汉语特征的中文对抗样本生成方法. 软件学报, 2023, 34 (11): 5143- 5161.
	LI X G , LUO H , SUN Y . Adversarial sample generation method based on Chinese features. Journal of Software, 2023, 34 (11): 5143- 5161.
25	张顺香, 吴厚月, 朱广丽, 等. 面向中文文本分类的字符级对抗样本生成方法. 电子与信息学报, 2023, 45 (6): 2226- 2235.
	ZHANG S X , WU H Y , ZHU G L , et al. Character-level adversarial samples generation approach for Chinese text classification. Journal of Electronics & Information Technology, 2023, 45 (6): 2226- 2235.

[1]	李倩, 向海昀, 张玉婷, 甘昀, 廖浩德. 结合高斯滤波与MASK的G-MASK人脸对抗攻击[J]. 计算机工程, 2024, 50(2): 308-316.
[2]	张玉婷, 向海昀, 李倩, 廖浩德. 基于稳定Adam和空间域变换的对抗样本生成算法[J]. 计算机工程, 2024, 50(1): 251-258.
[3]	李哲铭, 王晋东, 侯建中, 李伟, 张世华, 张恒巍. 基于显著区域优化的对抗样本攻击方法[J]. 计算机工程, 2023, 49(9): 246-255, 264.
[4]	杨燕燕, 谢明轩, 曹江峡, 王学宾, 柳厅文, 杜彦辉. 基于原型网络的中文分类模型对抗样本生成[J]. 计算机工程, 2023, 49(8): 54-62.
[5]	白祉旭, 王衡军. 基于改进遗传算法的对抗样本生成方法[J]. 计算机工程, 2023, 49(5): 139-149.
[6]	李哲铭, 张恒巍, 马军强, 王晋东, 杨博. 基于平移随机变换的对抗样本生成方法[J]. 计算机工程, 2022, 48(11): 152-160,183.
[7]	陈晓楠, 胡建敏, 张本俊, 陈爱玲. 基于模型间迁移性的黑盒对抗攻击起点提升方法[J]. 计算机工程, 2021, 47(8): 162-169.

选择文件类型/文献管理软件名称

选择包含的内容