An Adaptive Backdoor Defense Method for Image Inputs in MLaaS Scenarios

doi:10.19678/j.issn.1000-3428.0260273

Abstract

Abstract: To address the difficulty of deploying backdoor defenses in Machine Learning as a Service (MLaaS) black-box scenarios, this paper proposes an adaptive image preprocessing defense framework that relies solely on natural image statistics priors. The framework performs multi-dimensional feature analysis on input images to construct a backdoor risk quantification metric. According to the risk level, it dynamically selects and combines multi-level processing operations—including compression–reconstruction, geometric transformations, color perturbations, and dynamic random sequences—to disrupt the activation conditions of potential backdoor triggers. A quality feedback mechanism is introduced to balance defense effectiveness and visual usability. Experiments on the GTSRB, CIFAR-10, and MINI-ImageNet datasets evaluate five representative attacks, namely BadNets, Blended, WaNet, reflection attacks, and WaveAttack, which cover explicit patches, global blending, geometric warping, physical reflection, and frequency-domain perturbations. The results show that the proposed method reduces the average attack success rate to below 10% while preserving the model’s normal classification performance (with an average accuracy drop of no more than 3.5%). Notably, the suppression effect on WaveAttack is significant, achieving a success rate as low as 2.38%. Ablation studies confirm the critical role of the adaptive strategy and the quality feedback mechanism in performance improvement, and the framework exhibits stable performance across three datasets of varying scales, demonstrating strong generalization. This research provides an efficient and practical adaptive backdoor defense solution for MLaaS black-box services.

摘要： 针对机器学习即服务（MLaaS）黑盒场景下后门防御部署困难的问题，本文提出一种仅需自然图像统计先验的自适应图像预处理防御框架。该框架通过对输入图像进行多维度特征分析，构建后门风险量化指标；根据风险等级,动态选择并组合压缩-重建、几何变换、颜色扰动及动态随机序列等多层次处理操作，以破坏潜在后门触发器的激活条件，并引入质量反馈机制平衡防御效果与视觉可用性。在GTSRB、CIFAR-10和MINI-ImageNet数据集上的实验表明，面对BadNets、Blended、WaNet、反射攻击和WaveAttack五种涵盖显式补丁、全局混合、几何扭曲、物理反射及频域扰动的代表性攻击进行了评估，本方法在保持模型正常分类性能（平均准确率下降不超过3.5%）的同时，将攻击成功率平均降至10%以下，其中对WaveAttack攻击的抑制效果显著，成功率最低可降至2.38%。消融实验证实自适应策略与质量反馈机制对性能提升的关键作用，且在三个规模各异的数据集上均表现稳定，显示出良好的通用性。该研究为MLaaS黑盒服务提供了一种高效、实用的自适应后门防御新方案。

Tong Songsong, Yang Kuiwu, Zhou Gang, Ding Mengd. An Adaptive Backdoor Defense Method for Image Inputs in MLaaS Scenarios[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260273.

仝松松, 杨奎武, 周刚, 丁梦迪. 面向MLaaS场景的自适应图像输入后门防御方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260273.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0260273

References

[1] Jia X. Image recognition method based on deep learning//Proceedings of the 29th Chinese Control and Decision Conference. Chongqing, China: IEEE Press, 2017: 4730-4735.
[2] Li Y. Research and application of deep learning in image recognition//Proceedings of the IEEE International Conference on Power, Electronics and Computer Applications. Shenyang, China: IEEE Press, 2022: 994-999.
[3] Wang W G, Lai Q X, Shen J B, et al. Salient Object Detection in the Deep Learning Era: An In-Depth Survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3239-3259.
[4] Zhao Z Q, Zheng P, Xu S T, et al. Object Detection With Deep Learning: A Review[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 33(11): 3212-3232.
[5] Goldblum M, Tsipras D, Xie C, et al. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022,45(2): 1563-1580.
[6] 王人帅, 杨奎武, 陈越, 等. 面向图像数据的深度学习后门攻击技术综述[J/OL]. 计算机工程. https://doi.org/10.19678/j.issn.1000-3428.0070128. Wang R S, Yang K W, Chen Y, et al. Survey of Deep Learning Backdoor Attack on Image Data[J/OL]. Computer Engineering, https://doi.org/10.19678/j.issn.10 00-3428.0070128.
[7] 高梦楠, 陈伟, 吴礼发, 等. 面向深度学习的后门攻击及防御研究综述[J]. 软件学报, 2025,36(7):3271-3305. Gao M N, Chen W, Wu LF, et al. Survey on Backdoor Attacks and Defenses for Deep Learning Research[J]. Journal of Software, 2025, 36(7): 3271–3305 (in Chinese).
[8] 江钦辉,李默涵,孙彦斌.深度神经网络后门防御综述[J].信息安全学报,2024,9(04):47-63. Jiang Q H, Li M H, Sun Y B. A Survey on Defense against Deep Neural Network Backdoor Attack[J]. Journal of Cyber Security, 2024,9(04):47-63 (in Chinese).
[9] 汪旭童, 尹捷, 刘潮歌, 等.神经网络后门攻击与防御综述[J].计算机学报,2024,47(08):1713-1743. Wang X T, Yin J, Liu C G, et al. A survey of backdoor attacks and defenses on neural networks[J]. Chinese Journal of Computers, 2024,47(08):1713-1743.
[10] Tran B, Li J, Madry A. Spectral signatures in backdoor attacks// Proceedings of the 32nd International Conference on Neural Information Processing Systems. [1] Jia X. Image recognition method based on deep learning//Proceedings of the 29th Chinese Control and Decision Conference. Chongqing, China: IEEE Press, 2017: 4730-4735. Montreal, Canada, NeurIPS 2018: 8011-8021.
[11] Chen B, Carvalho W, Baracaldo N, et al. Detecting backdoor attacks on deep neural networks by activation clustering[EB/OL]. 2018. http://arxiv.org/abs/1811.03728
[12] Li Y G, Lyu X X, Koren N, et al. Anti-backdoor learning: Training clean models on poisoned data//Proceedings of the 35th Conference on Neural Information Processing Systems. Montreal, Canada, NeurIPS 2021: 14900-14912.
[13] Liu K, Dolan-Gavitt B, Garg S. Fine-pruning: Defending against backdooring attacks on deep neural networks//Proceedings of the Research in Attacks, Intrusions, and Defenses. Heraklion, Greece, 2018: 273-294.
[14] Wu D X, Wang Y S. Adversarial neuron pruning purifies backdoored deep models//Proceedings of the 35th Conference on Neural Information Processing Systems. Systems. Montreal, Canada, NeurIPS 2021: 16913-16925.
[15] Li Y G, Lyu X X, Koren N, et al. Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks//Proceedings of 2021 International Conference on Learning Representations. Virtual Event, Austria, ICLR 2021.
[16] 仝松松, 杨奎武, 王雯, 等. 特征阻断的计算机视觉模型后门防御机制[J/OL]. 中国图象图形学报, 2025,1-15. DOI: 10.11834/jig.250260. Tong S S, Yang K W, Wang W, et al. Backdoor Defense Mechanism for Computer Vision Models Based on Feature Blocking[J/OL]. Journal of Image and Graphics, 2025, 1-15. DOI: 10.11834/jig.250260.
[17] Liu Y T, Xie Y, Srivastava A. Neural Trojans//Proceedings of 2017 IEEE International Conference on Computer Design. Boston, USA, IEEE Press, ICCD 2017: 45-48.
[18] 郭钰生, 钱振兴, 张新鹏, 等. 抑制图像非语义信息的通用后门防御策略. 中国图象图形学报,2023.28(03):0836-0849. Guo Y S, Qian Z X, Zhang X P, et al. Non-semantic information suppression relevant backdoor defense implementation. Journal of Image and Graphics, 28(03): 0836-0849 (in Chinese).
[19] Li Y M, Zhai T Q, Jiang Y, et al. Backdoor Attack in the Physical World[EB/OL]. 2021.https://arxiv.org/pdf/21 04.02361.pdf.
[20] Chou E, Tramèr F, Pellegrino G. SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems// Proceedings of 2020 IEEE Security and Privacy Workshops. San Francisco, IEEE Press, 2020, SP (Workshops) 2020: 48-54.
[21] 高梦楠, 陈伟, 吴礼发, 等.面向深度学习的后门攻击及防御研究综述[J].软件学报,2025,36(07):3271-3305. Gao M N, Chen W, Wu L F, et al. Survey on Backdoor Attacks and Defenses for Deep Learning Research[J]. Journal of Software, 2025, 36(7): 3271–3305 (in Chinese).
[22] Li Y M, Ya M X, Bai Y, et al. BackdoorBox: A Python Toolbox for Backdoor Learning[A/OL]. 2023. https://arxiv.org/pdf/2302.01762.
[23] Gu T Y, Liu K, Dolan-Gavitt B, et al. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks[J]. IEEE Access. IEEE Access 7: 47230-47244 (2019).
[24] Chen X Y, Liu C, Li B, et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning [EB/OL]. 2017. https://arxiv.org/pdf/1712.05526.pdf
[25] Nguyen A, Tran A. WaNet -- Imperceptible Warping-based Backdoor Attack[J]. Computing Research Repository. 2021. CoRR abs/2102.10369.
[26] Liu Y, Ma X, Bailey J, et al. Reflection backdoor: A natural backdoor attack on deep neural networks//Proceedings of the European Conference on Computer Vision. Glasgow, UK, 2020: 182-199.
[27] Xia J, Yue Z H, Zhou Y B, et al. WaveAttack: Asymmetric frequency obfuscation-based backdoor attacks against deep neural networks//Proceedings of the 38th Conference on Neural Information Processing Systems. Vancouver, Canada, 2024. CoRR abs/2310.11595.
[28] Deng J, Dong W, Socher R, et al. ImageNet：a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA, 2009.［DOI:10.1109/cvpr. 2009.5206 848］
[29] Krizhevsky A, Hinton G. CIFAR-10 (Canadian institute for advanced research). Technical report, CIFAR. 2009.
[30] Stallkamp J, Schlipsing M, Salmen J, et al. The German traffic sign recognition benchmark: a multi-class classification competition// Proceedings of 2011 International Joint Conference on Neural Networks.San Jose, CA,USA.2011.[DOI:10.1109/IJCNN.2011.6033395].

Please choose a citation manager

Content to export