Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Enhancing Transferability of Adversarial Attacks on Large Vision-Language Models via Intermediate Layer Feature Alignment

  

  • Published:2026-01-30

基于特征对齐的大型视觉语言模型攻击迁移性研究

Abstract: Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal understanding and generation tasks. However, recent studies have revealed that these models exhibit significant vulnerability when exposed to adversarial attacks. Although several targeted black-box attack methods have been proposed to enhance the cross-model transferability of adversarial examples against LVLMs, their effectiveness and stability remain far from satisfactory. To address this issue, we propose a novel black-box targeted attack method with high transferability, termed Intermediate-Guided Transfer Attack (IGTA). The core idea of IGTA is to leverage a pre-trained vision encoder as a surrogate model and align the intermediate-layer features of the adversarial example with those of a target image. This intermediate-layer alignment strategy enables more direct and fine-grained manipulation of the model’s visual semantic understanding and high-level decision-making processes. Moreover, to further enhance transferability, the method incorporates fine-grained data augmentation techniques during optimization. Extensive black-box attack experiments on various mainstream LVLMs demonstrate that IGTA can efficiently generate highly transferable adversarial examples across different model architectures and task scenarios, significantly outperforming existing baseline approaches. Our findings reveal critical security risks in the visual reasoning components of current LVLMs and provide valuable insights for developing more robust multimodal models and corresponding defense mechanisms.

摘要: 大型视觉语言模型(Large Vision-Language Models, LVLM)在多模态理解与生成任务中取得了重要进展,但近期研究表明它们在对抗攻击下表现出显著脆弱性。尽管已有针对LVLM的黑盒有目标攻击方法尝试提升对抗样本的跨模型迁移性,但迁移效果和稳定性仍不理想。为解决此问题,本文提出了一种基于中间层特征对齐的高可迁移性黑盒有目标攻击方法(Intermediate-Guided Transfer Attack, IGTA)。该方法利用预训练的视觉编码器作为代理模型,通过在代理模型的中间层精确对齐对抗样本与目标图像的特征表示。这种中间层对齐策略旨在更直接、更深层次地影响模型对视觉语义的理解与高层决策逻辑。此外,为了进一步增强迁移性,该方法还结合了细粒度数据增强操作。在多种主流LVLM上的大量黑盒攻击实验结果表明,IGTA在不同模型架构和任务场景中均能高效生成具有迁移性的对抗样本,其攻击性能显著优于现有基线方法。本研究不仅揭示了当前LVLM在视觉结构理解上的潜在安全风险,也为未来设计更鲁棒的多模态模型及相应的防御机制提供了重要参考。