作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (10): 230-238. doi: 10.19678/j.issn.1000-3428.0065638

• 图形图像处理 • 上一篇    下一篇

基于图像降噪与压缩的对抗样本检测方法

王飞宇1, 张帆2, 杜加玉3, 类红乐3, 祁晓峰2   

  1. 1. 信息工程大学 信息技术研究所, 郑州 450002
    2. 国家数字交换系统工程技术研究中心, 郑州 450002
    3. 网络通信与安全紫金山实验室, 南京 211111
  • 收稿日期:2022-08-31 出版日期:2023-10-15 发布日期:2023-10-10
  • 作者简介:

    王飞宇(1997-), 男, 硕士研究生, 主研方向为人工智能、对抗样本防御

    张帆, 副研究员、博士

    杜加玉, 工程师、硕士

    类红乐, 工程师、硕士

    祁晓峰, 助理研究员、硕士

Adversarial Examples Detection Method Based on Image Denoising and Compression

Feiyu WANG1, Fan ZHANG2, Jiayu DU3, Hongle LEI3, Xiaofeng QI2   

  1. 1. Institute of Information Technology, Information Engineering University, Zhengzhou 450002, China
    2. National Digital Switching System Engineering and Technological R&D Center, Zhengzhou 450002, China
    3. Network Communication and Security Purple Mountain Laboratory, Nanjing 211111, China
  • Received:2022-08-31 Online:2023-10-15 Published:2023-10-10

摘要:

深度学习在计算机视觉领域的许多成果已广泛应用于现实生活。然而,对抗样本能够让深度学习模型以高置信度产生误判,进而造成严重的安全后果,同时对抗样本检测方法普遍存在计算成本高或依赖样本统计特性等问题。为此,提出一种基于预测不一致的对抗样本检测方法。若将对抗扰动视作不必要的特征,通过图像降噪或压缩技术来压缩样本的特征空间,从而减少对抗扰动。通常压缩特征空间前后的正常样本在深度学习模型中的分类结果差别较小,而相同处理前后对抗样本的分类结果差别较大。通过测量深度学习模型对原输入的预测结果与压缩特征空间后输入预测结果之间的距离来检测对抗攻击,若其大于阈值,则该输入具有对抗性。该检测方法的训练集选取与对抗样本无关,而且无须对原深度学习模型进行调整。实验结果表明,该方法在保证较低假阳性率的同时,能够对快速梯度符号法(FGSM)、JSMA和C&W等经典攻击进行有效检测,在MNIST和CIFAR-10数据集上的平均检测率高达99.77%和87.90%。

关键词: 深度学习, 对抗样本, 对抗样本检测, 图像降噪, 图像压缩

Abstract:

Numerous deep learning achievements in the field of computer vision have been widely applied in real life. However, adversarial examples can lead to false positives in deep learning models with high confidence, resulting in serious security consequences. Adversarial examples detection methods generally suffer from problems, such as high computational costs or dependence on example statistical characteristics. In this paper, a new adversarial example detection method based on prediction inconsistency is proposed. Considering adversarial disturbances as unnecessary features, image denoising or compression techniques are used to compress the feature space of the example, thereby reducing adversarial disturbances. The classification results of normal examples before and after feature space compression in deep learning models usually differ slightly, while the classification results of adversarial example before and after the same processing differ significantly. By measuring the distance between the predicted results of the original input and the predicted results of the compressed feature space in the deep learning model, adversarial attacks are detected. If the distance is greater than the threshold, the input is adversarial. The selection of the training set for the proposed detection method is independent of adversarial example and does not require adjustments to the original deep learning model. The experimental results show that the proposed method can effectively detect classic attacks such as Fast Gradient Sign Method (FGSM), JSMA, and C&W attack while ensuring a low false positive rate. The average detection rates on the MNIST and CIFAR-10 datasets reached as high as 99.77% and 87.90%, respectively.

Key words: deep learning, adversarial examples, adversarial examples detection, image denoising, image compression