Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

An Adaptive Backdoor Defense Method for Image Inputs in MLaaS Scenarios

  

  • Published:2026-06-12

面向MLaaS场景的自适应图像输入后门防御方法

Abstract: To address the difficulty of deploying backdoor defenses in Machine Learning as a Service (MLaaS) black-box scenarios, this paper proposes an adaptive image preprocessing defense framework that relies solely on natural image statistics priors. The framework performs multi-dimensional feature analysis on input images to construct a backdoor risk quantification metric. According to the risk level, it dynamically selects and combines multi-level processing operations—including compression–reconstruction, geometric transformations, color perturbations, and dynamic random sequences—to disrupt the activation conditions of potential backdoor triggers. A quality feedback mechanism is introduced to balance defense effectiveness and visual usability. Experiments on the GTSRB, CIFAR-10, and MINI-ImageNet datasets evaluate five representative attacks, namely BadNets, Blended, WaNet, reflection attacks, and WaveAttack, which cover explicit patches, global blending, geometric warping, physical reflection, and frequency-domain perturbations. The results show that the proposed method reduces the average attack success rate to below 10% while preserving the model’s normal classification performance (with an average accuracy drop of no more than 3.5%). Notably, the suppression effect on WaveAttack is significant, achieving a success rate as low as 2.38%. Ablation studies confirm the critical role of the adaptive strategy and the quality feedback mechanism in performance improvement, and the framework exhibits stable performance across three datasets of varying scales, demonstrating strong generalization. This research provides an efficient and practical adaptive backdoor defense solution for MLaaS black-box services.

摘要: 针对机器学习即服务(MLaaS)黑盒场景下后门防御部署困难的问题,本文提出一种仅需自然图像统计先验的自适应图像预处理防御框架。该框架通过对输入图像进行多维度特征分析,构建后门风险量化指标;根据风险等级,动态选择并组合压缩-重建、几何变换、颜色扰动及动态随机序列等多层次处理操作,以破坏潜在后门触发器的激活条件,并引入质量反馈机制平衡防御效果与视觉可用性。在GTSRB、CIFAR-10和MINI-ImageNet数据集上的实验表明,面对BadNets、Blended、WaNet、反射攻击和WaveAttack五种涵盖显式补丁、全局混合、几何扭曲、物理反射及频域扰动的代表性攻击进行了评估,本方法在保持模型正常分类性能(平均准确率下降不超过3.5%)的同时,将攻击成功率平均降至10%以下,其中对WaveAttack攻击的抑制效果显著,成功率最低可降至2.38%。消融实验证实自适应策略与质量反馈机制对性能提升的关键作用,且在三个规模各异的数据集上均表现稳定,显示出良好的通用性。该研究为MLaaS黑盒服务提供了一种高效、实用的自适应后门防御新方案。