作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (4): 158-164. doi: 10.19678/j.issn.1000-3428.0061470

• 网络空间安全 • 上一篇    下一篇

基于噪声溶解的对抗样本防御方法

杨文雪1,2, 吴非1,2, 郭桐1,2, 肖利民1,2   

  1. 1. 北京航空航天大学 软件开发环境国家重点实验室, 北京 100191;
    2. 北京航空航天大学 计算机学院, 北京 100191
  • 收稿日期:2021-04-26 修回日期:2021-07-22 发布日期:2021-08-11
  • 作者简介:杨文雪(1998—),女,硕士研究生,主研方向为深度学习安全;吴非,博士研究生;郭桐,硕士研究生;肖利民,教授、博士。
  • 基金资助:
    国家重点研发计划(2017YFB1010000);北京航空航天大学软件开发环境国家重点实验室基金(SKLSDE-2020ZX-15)。

Adversarial Sample Defense Method Based on Noise Dissolution

YANG Wenxue1,2, WU Fei1,2, GUO Tong1,2, XIAO Limin1,2   

  1. 1. State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China;
    2. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
  • Received:2021-04-26 Revised:2021-07-22 Published:2021-08-11

摘要: 深度神经网络在发展过程中暴露出的对抗攻击等安全问题逐渐引起了人们的关注和重视。然而,自对抗样本的概念提出后,针对深度神经网络的对抗攻击算法大量涌现,而深度神经网络自身的复杂性和不可解释性增大了防御攻击的难度。为了保证防御方法的普适性,以预处理方法为基本思路,同时结合对抗样本自身的特异性,提出一种新的对抗样本防御方法。考虑对抗攻击的隐蔽性和脆弱性,利用深度学习模型的鲁棒性,通过噪声溶解过程降低对抗扰动的攻击性和滤波容忍度。在滤波过程中,以对抗噪声贡献为依据自适应调整滤波范围及强度,有针对性地滤除对抗噪声,该方法不需要对现有深度学习模型进行修改和调整,且易于部署。实验结果表明,在ImageNet数据集下,该方法对经典对抗攻击方法L-BFGS、FGSM、Deepfool、JSMA及C&W的防御成功率均保持在80%以上,与JPEG图像压缩、APE-GAN以及图像分块去噪经典预处理防御方法相比,防御成功率分别提高9.25、14.86及14.32个百分点以上,具有较好的防御效果,且普适性强。

关键词: 深度神经网络, 对抗样本, 乘性噪声, 类激活映射, 自适应滤波

Abstract: The security problems exposed in the rapid development of the Deep Neural Network(DNN) have gradually attracted our attention.However, since adversarial examples were first defined, many adversarial attacks on DNNs have been proposed, and the complexity and weak interpretability of DNNs increases their vulnerability to these attacks.To ensure the universality of our defense methods, in this paper, we propose a defense method against adversarial attacks based on the dissolution of noise.The proposed method takes pre-processing as the basic idea and combines it with the specificity of adversarial examples.Considering the stealthiness and vulnerability of adversarial attacks, we design the process of noise dissolution to destroy the aggressivity and the filtering tolerability of adversarial disturbance, taking advantage of the robustness of DNN.In the subsequent filtering process, we adaptively adjust the filtering range and intensity based on adversarial disturbance contribution and targeted filter adversarial noise.Our method is easy to deploy without modifying DNN.And the experiment results show that the defense success rate on the ImageNet dataset of our method against the classical adversarial attacks L-BFGS, FGSM, Deepfool, JSMA, and C&W is above 80%, and is 9.25, 14.86 and 14.32 percentage point higher than the classical pre-processing defense methods JPEG compression, APE-GAN, and D3, respectively.Our method has a good defense effect and strong universality.

Key words: Deep Neural Network(DNN), adversarial examples, multiplicative noise, class activation mapping, adaptive filtering

中图分类号: