基于强度相关正则化学习的对抗鲁棒蒸馏方法

doi:10.19678/j.issn.1000-3428.0069656

摘要/Abstract

摘要：

针对现有对抗鲁棒蒸馏(ARD)方法存在不充分和不可靠的教师网络指导及固定的攻击强度问题, 提出一种基于强度相关正则化学习(ICRL)的ARD方法。该方法包括多维度知识蒸馏和强度动态调整攻击两个关键模块。多维度知识蒸馏通过跨越师生logit的实例维度和类别维度的知识蒸馏以及学生内省自我维度的知识蒸馏, 有效地解决了因教师网络指导不充分和不可靠造成的分布差异问题。为了使对抗样本的攻击强度可以随着学生网络鲁棒性的增强而自适应更新, 设计一套精简有效的攻击强度动态调整算法, 旨在为每个实例动态选择和分配适配的攻击强度。此外, ICRL还从攻击强度角度对学生内省自我维度进行正则化规范, 自适应规范化学生内省损失并避免极端对抗扰动实例的影响。在CIFAR-10和CIFAR-100数据集上的大量实验结果表明, 该方法不仅可以作为大多数主流的ARD框架的通用插件, 而且大大增强了基准方法对多步骤攻击的抵抗力, 特别是对于当前表现最佳的基准方法AdaAD, 在学生网络为ResNet-18的条件下, AdaAD-ICRL在投影梯度下降(PGD)-10攻击下的对抗鲁棒精度分别提高了2.06和2.11百分点, 这验证了该方法在现有框架中的兼容性与有效性。

关键词: 对抗鲁棒, 对抗训练, 知识蒸馏, 正则化, 强度动态调整

Abstract:

This research introduces an Adversarial Robust Distillation (ARD) method based on Intensity Correlation Regularization Learning (ICRL) to address the limitations of existing ARD approaches, which are hindered by insufficient and unreliable guidance from the teacher network and fixed attack strengths. The proposed method comprises two key modules: multidimensional knowledge distillation and dynamic adjustment of attack intensity. Multidimensional knowledge distillation effectively addresses distributional discrepancies caused by inadequate or unreliable teacher network guidance by incorporating instance-level and class-level knowledge distillation across teacher and student logits, as well as introspective self-distillation within the student network. To enable adaptive updates of attack strength based on the improved robustness of the student network, an efficient intensity dynamic adjustment algorithm is designed to dynamically select and assign the appropriate attack intensities for each instance. Additionally, ICRL applies regularization to the introspective self-distillation dimension from the attack strength perspective, adaptively normalizing the student's introspective loss and mitigating the impact of extremely adversarial perturbation instances. Extensive experimental results on the CIFAR-10 and CIFAR-100 datasets demonstrated that this method functions as a universal plugin for most mainstream ARD frameworks, and significantly enhances the resilience of baseline methods against multistep attacks. In particular, with the current state-of-the-art baseline method named AdaAD, when using a ResNet-18 student network, AdaAD-ICRL achieves improvements of 2.06 and 2.11 percentage points in adversarial robust accuracy against Projected Gradient Descent (PGD)-10 attacks, validating the compatibility and effectiveness of ICRL within existing frameworks.

Key words: adversarial robustness, Adversarial Training (AT), knowledge distillation, regularization, dynamic intensity adjustment

林烁彬, 蔡捷仪, 方晓城, 张正, 卢光明, 陈炳志. 基于强度相关正则化学习的对抗鲁棒蒸馏方法[J]. 计算机工程, 2025, 51(1): 42-50.

LIN Shuobin, CAI Jieyi, FANG Xiaocheng, ZHANG Zheng, LU Guangming, CHEN Bingzhi. Adversarial Robust Distillation Method Based on Intensity Correlation Regularization Learning[J]. Computer Engineering, 2025, 51(1): 42-50.

https://www.ecice06.com/CN/Y2025/V51/I1/42

图/表 3

参考文献 35

1	TIAN C W, ZHANG X Y, WANG T, et al. Adaptive convolutional neural network for image super-resolution[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2402.15704.
2	TIAN C W, ZHANG X Y, LIN J C W, et al. Generative adversarial networks for image super-resolution: a survey[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2204.13620.
3	OUYANG L, WU J, XU J, et al. Training language models to follow instructions with human feedback[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2203.02155.
4	CHEN B Z , LIU Y S , ZHANG Z , et al. TransAttUnet: multi-level attention-guided U-net with Transformer for medical image segmentation. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8 (1): 55- 68. doi: 10.1109/TETCI.2023.3309626
5	GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]. [2024-02-11]. http://arxiv.org/abs/1412.6572.
6	MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[EB/OL]. [2024-02-11]. http://arxiv.org/abs/1706.06083.
7	金柯君, 于洪涛, 吴翼腾, 等. 基于改进投影梯度下降算法的图卷积网络投毒攻击. 计算机工程, 2022, 48 (10): 176- 183. URL
	JIN K J , YU H T , WU Y T , et al. Poisoning attack on graph convolutional network based on improved projection gradient descent algorithm. Computer Engineering, 2022, 48 (10): 176- 183. URL
8	李倩, 向海昀, 张玉婷, 等. 结合高斯滤波与MASK的G-MASK人脸对抗攻击. 计算机工程, 2024, 50 (2): 308- 316. URL
	LI Q , XIANG H Y , ZHANG Y T , et al. G-MASK facial adversarial attackcombining Gaussian filtering and MASK. Computer Engineering, 2024, 50 (2): 308- 316. URL
9	刘颖, 杨鹏飞, 张立军, 等. 前馈神经网络和循环神经网络的鲁棒性验证综述. 软件学报, 2023, 34 (7): 3134- 3166.
	LIU Y , YANG P F , ZHANG L J , et al. Survey on robustness verification of feedforward neural networks and recurrent neural networks. Journal of Software, 2023, 34 (7): 3134- 3166.
10	姜妍, 张立国. 面向深度学习模型的对抗攻击与防御方法综述. 计算机工程, 2021, 47 (1): 1- 11. URL
	JIANG Y , ZHANG L G . Survey of adversarial attacks and defense methods for deep learning model. Computer Engineering, 2021, 47 (1): 1- 11. URL
11	刘艾杉, 郭骏, 李思民, 等. 面向深度强化学习的对抗攻防综述. 计算机学报, 2023, 46 (8): 1553- 1576.
	LIU A S , GUO J , LI S M , et al. A survey on adversarial attacks and defenses for deep reinforcement learning. Chinese Journal of Computers, 2023, 46 (8): 1553- 1576.
12	ZHU J, YAO J, HAN B, et al. Reliable adversarial distillation with unreliable teachers[EB/OL]. [2024-02-11]. https://arxiv.org/abs/2106.04928?context=cs.LG.
13	GOLDBLUM M, FOWL L, FEIZI S, et al. Adversarially robust distillation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2020: 3996-4003.
14	ZI B J, ZHAO S H, MA X J, et al. Revisiting adversarial robustness distillation: robust soft labels make student better[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 16443-16452.
15	NASEER M , KHAN S , HAYAT M , et al. Stylized adversarial defense. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45 (5): 6403- 6414.
16	ZHAO S J, YU J, SUN Z L, et al. Enhanced accuracy and robustness via multi-teacher adversarial distillation[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 585-602.
17	HUANG B, CHEN M Y, WANG Y, et al. Boosting accuracy and robustness of student models via adaptive adversarial distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 24668-24677.
18	LI Y F, HU P, LIU Z T, et al. Contrastive clustering[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2021: 8547-8555.
19	JIA X J, ZHANG Y, WU B Y, et al. LAS-AT: adversarial training with learnable attack strategy[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 13388-13398.
20	CHENG M H, LEI Q, CHEN P Y, et al. CAT: customized adversarial training for improved robustness[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2002.06789.
21	CROCE F, HEIN M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2003.01690.
22	JIA X J, WEI X X, CAO X C, et al. ComDefend: an efficient imagecompression model to defend adversarial examples[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 6077-6085.
23	ATHALYE A, CARLINI N, WAGNER D. Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples[EB/OL]. [2024-02-11]. http://arxiv.org/abs/1802.00420.
24	ZHANG H Y, YU Y D, JIAO J T, et al. Theoretically principled trade-off between robustness and accuracy[EB/OL]. [2024-02-11]. http://arxiv.org/abs/1901.08573.
25	HSIUNG L, TSAI Y Y, CHEN P Y, et al. Towardscompositional adversarial robustness: generalizing adversarial training tocomposite semantic perturbations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 24658-24667.
26	DONG J H, MOOSAVI-DEZFOOLI S M, LAI J H, et al. The enemy of my enemy is my friend: exploring inverse adversaries for improving adversarial training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 24678-24687.
27	XIE C H, WU Y X, VAN DER MAATEN L, et al. Feature denoising for improving adversarial robustness[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 501-509.
28	WANG Y S, MA X J, BAILEY J, et al. On the convergence and robustness of adversarial training[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2112.08304.
29	WEI Z M, WANG Y F, GUO Y W, et al. CFA: class-wise calibrated fair adversarial training[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 8193-8201.
30	ZHANG T Y, ZHU Z X. Interpreting adversarially trained convolutional neural networks[EB/OL]. [2024-02-11]. http://arxiv.org/abs/1905.09797.
31	TIAN C W, ZHANG X Y, LIANG X, et al. Knowledge distillation with fast CNN for license plate detection[EB/OL]. [2024-02-11]. https://www.semanticscholar.org/paper/Knowledge-Distillation-With-Fast-CNN-for-License-Tian-Zhang/a2ffb5e78164b737bf11855be2b915f75e25ee8d.
32	CHEN T, ZHANG Z, LIU S, et al. Robust overfitting may be mitigated by properly learned smoothening[C]//Proceedings of International Conference on Learning Representations. Washington D.C., USA: IEEE Press, 2020: 1-10.
33	RICE L, WONG E, KOLTER J Z. Overfitting in adversarially robust deep learning[EB/OL]. [2024-02-11]. http://arxiv.org/abs/2002.11569.
34	CHEN E C, LEE C R. LTD: low temperature distillation for robust adversarial training[EB/OL]. 2024-02-11]. http://arxiv.org/abs/2111.02331.
35	PANG T Y, YANG X, DONG Y P, et al. Bag of tricks for adversarial training[EB/OL]. 2024-02-11]. http://arxiv.org/abs/2010.00467.

[1]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[2]	王言国, 吕鹏远, 兰金江, 刘明哲, 秦冠军, 张硕桦, 周宇. 基于对抗训练与Transformer的风力发电机故障分类方法[J]. 计算机工程, 2024, 50(9): 377-384.
[3]	屠乃威, 焦猛, 阎馨. 复杂环境下输电线路鸟巢目标图像检测模型[J]. 计算机工程, 2024, 50(7): 216-226.
[4]	魏琢艺, 罗迈, 李文兵, 曾远松, 余伟江, 杨跃东. 基于多源域适应的单细胞智能分类[J]. 计算机工程, 2024, 50(6): 48-55.
[5]	周昭辰, 方清茂, 吴晓红, 胡平, 何小海. 基于MacBERT与对抗训练的机器阅读理解模型[J]. 计算机工程, 2024, 50(5): 41-50.
[6]	高锐涛, 林达伟, 郭亮, 金鸿, 王红. 基于知识图谱的水稻种植智能问答系统设计与实现[J]. 计算机工程, 2024, 50(12): 133-141.
[7]	任义, 苏博, 袁帅. 教育领域下多维度特征命名实体识别方法[J]. 计算机工程, 2024, 50(10): 110-118.
[8]	沈志东, 岳恒宪. 基于分布式扰动的文本对抗训练方法[J]. 计算机工程, 2023, 49(9): 16-22.
[9]	曹坪, 杨怀志, 薄一军, 尤嘉, 张淳杰, 李丹勇. 面向低质量裂缝图像的多知识蒸馏分类[J]. 计算机工程, 2023, 49(7): 204-213.
[10]	朱红, 牛浩然, 朱彤. 基于字词融合与对抗训练的行业人物实体识别[J]. 计算机工程, 2023, 49(5): 56-62.
[11]	毛亮, 赵林均, 余敦辉, 孙斌. 基于知识蒸馏的企业命名实体识别模型[J]. 计算机工程, 2023, 49(5): 90-96.
[12]	冉瑞生, 翁稳稳, 王宁, 彭顺顺. 基于人脸关键特征提取的表情识别[J]. 计算机工程, 2023, 49(2): 254-262.
[13]	富坤, 孙明磊, 郝玉涵, 刘赢华. 基于对抗训练的伪标签约束自编码器[J]. 计算机工程, 2023, 49(11): 123-130.
[14]	李惠森, 侯进, 党辉, 周宇航. 基于渐进式训练的多判别器域适应目标检测[J]. 计算机工程, 2023, 49(10): 202-211, 221.
[15]	詹健浩, 甘利鹏, 毕永辉, 曾鹏, 李晓潮. 基于知识蒸馏的多模态融合行为识别方法[J]. 计算机工程, 2023, 49(10): 280-288, 297.

选择文件类型/文献管理软件名称

选择包含的内容