基于特征融合的对抗样本定向目标攻击可迁移性增强

doi:10.19678/j.issn.1000-3428.0069983

摘要/Abstract

摘要：

对抗样本可以在不知道黑盒模型内部的结构以及参数时利用代理模型进行迁移性攻击, 现有研究针对黑盒模型的定向目标攻击可迁移性都比较弱。提出一种基于特征融合增强图像定向目标攻击可迁移性的方法。通过模型集成攻击得到对抗样本, 以现有对抗样本的梯度方向为基准, 利用从原图提取出的干净特征作为干扰来微调现有的对抗样本, 以提高定向目标攻击的可迁移性。对于模型集成, 根据每个模型对整体对抗目标的贡献大小引入梯度自适应模块, 为减少不同模型之间的梯度差异, 提出梯度滤波器来同步控制梯度方向, 通过特征融合模块混合原图的干净特征对现有对抗样本的梯度方向进行微调以缓解过度关注特定特征的问题。在ImageNet-Compatible数据集上的对比实验结果表明, 所提方法对非鲁棒性训练模型, 相较CFM(Clean Feature Mixup)方法平均攻击成功率提升了7.7百分点, 对鲁棒性训练模型以及Tansformer模型, 相较CFM方法平均攻击成功率提升了5.3百分点, 验证了方法的有效性。

关键词: 深度学习, 对抗攻击, 对抗样本, 定向目标攻击, 可迁移性

Abstract:

Adversarial examples can be used to perform transferable attacks on black-box models using surrogates, without knowing the internal structure and parameters of the black-box model. Previous studies have reported relatively low transferability of targeted attacks on black-box models. This study proposes a method for enhancing the transferability of image-directed targeted attacks based on feature fusion. First, adversarial examples are generated via ensemble attacks. Subsequently, using the gradient direction of existing adversarial examples as a baseline, clean features extracted from the original image are used as perturbations to fine tune the existing adversarial examples for improving the transferability of targeted attacks. For model ensembling, a gradient adaptive module is introduced based on the contribution of each model to the overall adversarial objective. To reduce the gradient differences among different models, a gradient filter is proposed for synchronously controlling the gradient direction. Using the feature fusion module, the clean features of the original image are mixed to fine tune the gradient direction of the existing adversarial examples for mitigating the issue of overfocusing on specific features. Experiments on the ImageNet-Compatible dataset reveal that, compared to the Clean Feature Mixup (CFM) method, the proposed method improves the average attack success rate by 7.7 percentage points for non-robustly trained models and by 5.3 percentage points for robustly trained and Transformer models, demonstrating the effectiveness of the method.

Key words: deep learning, adversarial attacks, adversarial examples, directed targeted attacks, transferability

凌海, 凌捷. 基于特征融合的对抗样本定向目标攻击可迁移性增强[J]. 计算机工程, 2025, 51(11): 162-170.

LING Hai, LING Jie. Transferability Enhancement of Adversarial Sample Directed Targeted Attack Based on Feature Fusion[J]. Computer Engineering, 2025, 51(11): 162-170.

https://www.ecice06.com/CN/Y2025/V51/I11/162

图/表 9

图1 本文方法框架

Fig.1 Framework of the proposed method

图2 特征融合模块示意图

Fig.2 Schematic diagram of feature fusion module

图3 基于优化的定向目标攻击成功率

Fig.3 Directed targeted attack success rate based on optimization

图4 针对梯度自适应权重因子β以及梯度滤波器二叉阈值η的消融实验

Fig.4 Ablation experiments on gradient adaptive weight factor β and gradient filter binary threshold η

参考文献 24

1	SADAK F , SAADAT M , HAJIYAVAND A M . Real-time deep learning-based image recognition for applications in automated positioning and injection of biological cells. Computers in Biology and Medicine, 2020, 125, 103976. doi: 10.1016/j.compbiomed.2020.103976
2	SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[EB/OL]. [2024-05-02]. https://arxiv.org/abs/1312.6199v4.
3	GOODFELLOW I J, SHLENS J, SZEGEDY C, et al. Explaining and harnessing adversarial examples[EB/OL]. [2024-05-02]. https://arxiv.org/abs/1412.6572v3.
4	刘梦庭, 凌捷. 优化梯度增强黑盒对抗攻击算法. 计算机工程与应用, 2023, 59 (18): 260- 267.
	LIU M T , LING J . Optimized gradient boosting black-box adversarialattack algorithm. Computer Engineering and Applications, 2023, 59 (18): 260- 267.
5	郑德生, 陈继鑫, 周静, 等. 基于输入通道拆分的对抗攻击迁移性增强算法. 计算机工程, 2023, 49 (1): 130- 137. doi: 10.19678/j.issn.1000-3428.0064362
	ZHENG D S , CHEN J X , ZHOU J , et al. Adversarial attack transferability enhancement algorithm based on input channel splitting. Computer Engineering, 2023, 49 (1): 130- 137. doi: 10.19678/j.issn.1000-3428.0064362
6	WANG X S, LIN J D, HU H, et al. Boosting adversarial transferability through enhanced momentum[EB/OL]. [2024-05-02]. https://arxiv.org/abs/2103.10609v1.
7	李哲铭, 王晋东, 侯建中, 等. 基于显著区域优化的对抗样本攻击方法. 计算机工程, 2023, 49 (9): 246-255, 264. doi: 10.19678/j.issn.1000-3428.0065814
	LI Z M , WANG J D , HOU J Z , et al. Adversarial example attack method based on salient region optimization. Computer Engineering, 2023, 49 (9): 246-255, 264. doi: 10.19678/j.issn.1000-3428.0065814
8	LIN J D, SONG C B, HE K, et al. Nesterov accelerated gradient and scale invariance for adversarial attacks[EB/OL]. [2024-05-02]. https://arxiv.org/abs/1908.06281v5.
9	DONG Y P, PANG T Y, SU H, et al. Evading defenses to transferable adversarial examples by translation-invariant attacks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE Press, 2019: 4307-4316.
10	ZHANG Q L, LI X D, CHEN Y F, et al. Beyond ImageNet attack: towards crafting adversarial examples for black-box domains[EB/OL]. [2024-05-02]. https://arxiv.org/abs/2201.11528v4.
11	赫晓慧, 宋定君, 李盼乐, 等. 融合多尺度特征的遥感影像道路提取方法. 计算机工程, 2022, 48 (8): 196- 205. doi: 10.19678/j.issn.1000-3428.0062451
	HE X H , SONG D J , LI P L , et al. Remote sensing image road extraction method combined with multiscale features. Computer Engineering, 2022, 48 (8): 196- 205. doi: 10.19678/j.issn.1000-3428.0062451
12	LIU Y, CHEN X, LIU C, et al. Delving into transferable adversarial examples and black-box attacks[EB/OL]. [2024-05-02]. https://arxiv.org/abs/1611.02770.
13	XIONG Y F, LIN J D, ZHANG M, et al. Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE Press, 2022: 14963-14972.
14	MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[EB/OL]. [2024-05-02]. https://arxiv.org/abs/1706.06083v4.
15	PANG T, XU K, DU C, et al. Improving adversarial robustness via promoting ensemble diversity[C]//Proceedings of International Conference on Machine Learning. [S. l. ]: PMLR, 2019: 4970-4979.
16	赖妍菱, 石峻峰, 陈继鑫, 等. 基于U-Net的对抗样本防御模型. 计算机工程, 2021, 47 (12): 163- 170. doi: 10.19678/j.issn.1000-3428.0060571
	LAI Y L , SHI J F , CHEN J X , et al. Adversarial example defense model based on U-Net. Computer Engineering, 2021, 47 (12): 163- 170. doi: 10.19678/j.issn.1000-3428.0060571
17	ZHAO Z , LIU Z , LARSON M . On success and simplicity: a second look at transferable targeted attacks. Advances in Neural Information Processing Systems, 2021, 34, 6115- 6128.
18	BYUN J, CHO S, KWON M J, et al. Improving the transferability of targeted adversarial examples through object-based diverse input[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE Press, 2022: 15223-15232.
19	SPRINGER J , MITCHELL M , KENYON G . A little robustness goes a long way: leveraging robust features for targeted transfer attacks. Advances in Neural Information Processing Systems, 2021, 34, 9759- 9773.
20	ZOU J, PAN Z, QIU J, et al. Improving the transferability of adversarial examples with resized-diverse-inputs, diversity-ensemble and region fitting[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer International Publishing, 2020: 563-579.
21	DONG Y P, LIAO F Z, PANG T Y, et al. Boosting adversarial attacks with momentum[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 9185-9193.
22	WANG X S, HE K. Enhancing the transferability of adversarial attacks through variance tuning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE Press, 2021: 1924-1933.
23	WANG X S, HE X R, WANG J D, et al. Admix: enhancing the transferability of adversarial attacks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE Press, 2021: 16138-16147.
24	BYUN J, KWON M J, CHO S, et al. Introducing competition to boost the transferability of targeted adversarial examples through clean feature mixup[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE Press, 2023: 24648-24657.

[1]	黄金贵, 刘朋, 唐文胜. MMD-YOLOv7:黑暗条件下车辆检测方法[J]. 计算机工程, 2025, 51(9): 340-349.
[2]	周晨阳, 刘雪宇, 梁少华, 吴永飞. 基于Swin Transformer的肾动脉血管检测分割与定量分析[J]. 计算机工程, 2025, 51(9): 252-267.
[3]	徐瀅, 傅紫薇, 张伟, 陈云芳. 基于抽象语法树嵌入的智能合约漏洞检测技术[J]. 计算机工程, 2025, 51(9): 149-157.
[4]	夏倪明, 张洁. 基于自适应集束搜索算法的中文对抗样本生成[J]. 计算机工程, 2025, 51(8): 131-140.
[5]	林帆, 李建华. 基于多阶门控聚合网络的光学化学结构识别[J]. 计算机工程, 2025, 51(8): 364-372.
[6]	郝宏达, 罗健旭. 基于多尺度区域特征融合的多器官语义分割模型[J]. 计算机工程, 2025, 51(8): 270-280.
[7]	武东辉, 王金凤, 仇森, 刘国志. 基于EWBiLSTM-ATT的数据手套手语识别[J]. 计算机工程, 2025, 51(8): 107-119.
[8]	武东辉, 王金凤, 仇森, 刘国志. 基于EWBiLSTM-ATT的数据手套手语识别[J]. 计算机工程, 2025, 51(8): 107-119.
[9]	李姜辛, 王鹏, 汪卫. 多机理指导的深度学习工业时序预测框架[J]. 计算机工程, 2025, 51(7): 47-58.
[10]	周哲臣, 胡冀苏, 钱旭升, 郑毅, 戴亚康, 周志勇. 基于查询自适应双层自注意力机制的MRI脑组织分割[J]. 计算机工程, 2025, 51(7): 294-304.
[11]	侯彦, 车蕾, 李慧. 面向中文的多层次扰动定位文本对抗样本生成方法[J]. 计算机工程, 2025, 51(7): 232-243.
[12]	孟波, 史旭华, 张彬. 基于双分支卷积和深度插值的点云表面重建[J]. 计算机工程, 2025, 51(7): 119-126.
[13]	周莎, 车生兵, 考友琛, 张旭, 郭甚驿. 基于特征选择和时空特征的网络入侵检测[J]. 计算机工程, 2025, 51(7): 223-231.
[14]	余鹏, 杨佳琦, 陈欣然, 贺超波. 基于二部图对比学习的特征增强推荐算法[J]. 计算机工程, 2025, 51(7): 100-110.
[15]	沙宇洋, 陆京涛, 杜浩凡, 翟小兵, 孟维宇, 廉旭, 罗刚, 李克峰. 适用于导盲场景的多尺度特征融合轻量化道路图像分割算法[J]. 计算机工程, 2025, 51(7): 314-325.

选择文件类型/文献管理软件名称

选择包含的内容