作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (9): 246-255, 264. doi: 10.19678/j.issn.1000-3428.0065814

• 图形图像处理 • 上一篇    下一篇

基于显著区域优化的对抗样本攻击方法

李哲铭1,2, 王晋东1, 侯建中2, 李伟2, 张世华2, 张恒巍1,*   

  1. 1. 信息工程大学 密码工程学院, 郑州 450001
    2. 中国人民解放军陆军参谋部, 北京 100000
  • 收稿日期:2022-09-21 出版日期:2023-09-15 发布日期:2023-09-14
  • 通讯作者: 张恒巍
  • 作者简介:

    李哲铭(1994—),男,硕士研究生,主研方向为深度学习

    王晋东,教授、硕士

    侯建中,高级工程师

    李伟,学士

    张世华,硕士

  • 基金资助:
    国家重点研发计划(2017YFB0801900)

Adversarial Example Attack Method Based on Salient Region Optimization

Zheming LI1,2, Jindong WANG1, Jianzhong HOU2, Wei LI2, Shihua ZHANG2, Hengwei ZHANG1,*   

  1. 1. School of Cryptographic Engineering, Information Engineering University, Zhengzhou 450001, China
    2. PLA Army Staff Department, Beijing 100000, China
  • Received:2022-09-21 Online:2023-09-15 Published:2023-09-14
  • Contact: Hengwei ZHANG

摘要:

在计算机视觉任务中,以卷积神经网络为基础的图像分类模型得到广泛应用,但因其自身的脆弱性容易受到对抗样本的攻击。目前的攻击方法大多会对整张图像进行攻击,产生的全局扰动影响了对抗样本的视觉质量。针对这一问题,提出一种基于显著区域优化的对抗样本攻击方法,利用显著目标检测技术为每张原始图像生成显著图,并将其二值化为显著掩模,将该掩模与对抗扰动相结合,使显著区域内的对抗扰动保留下来,实现对抗扰动的局部添加。通过引入Nadam优化算法, 稳定损失函数更新方向并动态调整学习率,提高损失函数收敛速度,从而在保持较高黑盒攻击成功率的同时,有效降低对抗扰动的可察觉性。在ImageNet数据集上分别进行单模型和集成模型环境下的对抗攻击实验,并对各方法生成的对抗样本图像质量进行对比分析,结果表明,与基准方法相比,该方法在集成模型攻击中的隐蔽性指标实现了27.2%的性能提升,黑盒攻击成功率最高达到了92.7%的水平。

关键词: 卷积神经网络, 对抗样本, 黑盒攻击, 局部优化, 迁移性

Abstract:

Convolutional neural network-based image classification models are widely used in computer vision tasks. However, these models are susceptible to adversarial examples due to their inherent vulnerability. Many existing attack techniques target the entire image, resulting in a global disturbance that degrade the visual quality of adversarial examples. To address this issue, this study introduces an adversarial example attack technique based on salient region optimization. Initially, salient object detection technology is employed to create a saliency map for each original image. This map is then converted into a saliency mask. By combining this mask with adversarial perturbations, disturbances are confined to salient regions. Furthermore, the Nadam optimization algorithm is introduced to stabilize the update direction of the loss function and dynamically adjust the learning rate. This innovation accelerates the convergence of the loss function, effectively reducing the visibility of adversarial disturbances while maintaining a high success rate in black-box attacks. Adversarial attack experiments are conducted on the ImageNet dataset under single-model and ensemble-model settings. Comparative analysis of the image quality of adversarial examples generated by each method reveals that, compared to the benchmark method, this approach achieves a 27.2% improvement in the concealment index in ensemble-model attacks. Additionally, the black-box attack success rate reaches an impressive 92.7%.

Key words: convolutional neural network, adversarial examples, black-box attack, local optimization, transferability