作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (1): 42-50. doi: 10.19678/j.issn.1000-3428.0069656

• 基于感知信息的图像处理 • 上一篇    下一篇

基于强度相关正则化学习的对抗鲁棒蒸馏方法

林烁彬1, 蔡捷仪1, 方晓城1, 张正2, 卢光明2, 陈炳志1,*()   

  1. 1. 华南师范大学软件学院, 广东 佛山 528000
    2. 哈尔滨工业大学(深圳)计算机科学与技术学院, 广东 深圳 518000
  • 收稿日期:2024-03-26 出版日期:2025-01-15 发布日期:2024-08-20
  • 通讯作者: 陈炳志
  • 基金资助:
    国家自然科学基金青年科学基金(62302172)

Adversarial Robust Distillation Method Based on Intensity Correlation Regularization Learning

LIN Shuobin1, CAI Jieyi1, FANG Xiaocheng1, ZHANG Zheng2, LU Guangming2, CHEN Bingzhi1,*()   

  1. 1. School of Software, South China Normal University, Foshan 528000, Guangdong, China
    2. School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen 518000, Guangdong, China
  • Received:2024-03-26 Online:2025-01-15 Published:2024-08-20
  • Contact: CHEN Bingzhi

摘要:

针对现有对抗鲁棒蒸馏(ARD)方法存在不充分和不可靠的教师网络指导及固定的攻击强度问题, 提出一种基于强度相关正则化学习(ICRL)的ARD方法。该方法包括多维度知识蒸馏和强度动态调整攻击两个关键模块。多维度知识蒸馏通过跨越师生logit的实例维度和类别维度的知识蒸馏以及学生内省自我维度的知识蒸馏, 有效地解决了因教师网络指导不充分和不可靠造成的分布差异问题。为了使对抗样本的攻击强度可以随着学生网络鲁棒性的增强而自适应更新, 设计一套精简有效的攻击强度动态调整算法, 旨在为每个实例动态选择和分配适配的攻击强度。此外, ICRL还从攻击强度角度对学生内省自我维度进行正则化规范, 自适应规范化学生内省损失并避免极端对抗扰动实例的影响。在CIFAR-10和CIFAR-100数据集上的大量实验结果表明, 该方法不仅可以作为大多数主流的ARD框架的通用插件, 而且大大增强了基准方法对多步骤攻击的抵抗力, 特别是对于当前表现最佳的基准方法AdaAD, 在学生网络为ResNet-18的条件下, AdaAD-ICRL在投影梯度下降(PGD)-10攻击下的对抗鲁棒精度分别提高了2.06和2.11百分点, 这验证了该方法在现有框架中的兼容性与有效性。

关键词: 对抗鲁棒, 对抗训练, 知识蒸馏, 正则化, 强度动态调整

Abstract:

This research introduces an Adversarial Robust Distillation (ARD) method based on Intensity Correlation Regularization Learning (ICRL) to address the limitations of existing ARD approaches, which are hindered by insufficient and unreliable guidance from the teacher network and fixed attack strengths. The proposed method comprises two key modules: multidimensional knowledge distillation and dynamic adjustment of attack intensity. Multidimensional knowledge distillation effectively addresses distributional discrepancies caused by inadequate or unreliable teacher network guidance by incorporating instance-level and class-level knowledge distillation across teacher and student logits, as well as introspective self-distillation within the student network. To enable adaptive updates of attack strength based on the improved robustness of the student network, an efficient intensity dynamic adjustment algorithm is designed to dynamically select and assign the appropriate attack intensities for each instance. Additionally, ICRL applies regularization to the introspective self-distillation dimension from the attack strength perspective, adaptively normalizing the student's introspective loss and mitigating the impact of extremely adversarial perturbation instances. Extensive experimental results on the CIFAR-10 and CIFAR-100 datasets demonstrated that this method functions as a universal plugin for most mainstream ARD frameworks, and significantly enhances the resilience of baseline methods against multistep attacks. In particular, with the current state-of-the-art baseline method named AdaAD, when using a ResNet-18 student network, AdaAD-ICRL achieves improvements of 2.06 and 2.11 percentage points in adversarial robust accuracy against Projected Gradient Descent (PGD)-10 attacks, validating the compatibility and effectiveness of ICRL within existing frameworks.

Key words: adversarial robustness, Adversarial Training (AT), knowledge distillation, regularization, dynamic intensity adjustment