作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

一种面向视觉问答的自适应偏差学习网络

  • 发布日期:2025-05-23

ABLNet: An Adaptive Bias Learning Network for Visual Question Answering

  • Published:2025-05-23

关键词: Visual Question Answering (VQA) is a research direction in cross-modal analysis focused on understanding and interpreting input images and their corresponding text questions to provide relevant natural language answers. However, existing research is limited by dependence on dataset factors, including pseudo-correlation, dataset bias, and shortcut learning, which challenges the robustness of algorithms. To improve the bias model’s capacity to learn from dataset bias, this paper proposes an Adaptive Bias Learning Network (ABLNet) for VQA tasks. ABLNet introduces two technical innovations: First, a self-adaptive sample reweighting mechanism dynamically assigns weights to each sample based on gradient information, enhancing the model’s ability to learn bias features and improve generalization. Second, a network pruning strategy based on restricted learning is introduced to limit the model’s dependence on surface correlations and dataset biases. Extensive experiments on challenging VQA datasets, VQA-CPv1, VQA-CPv2, and VQA-v2, demonstrate the effectiveness of our method.

Abstract: 视觉问答(Visual Question Answering, VQA)理解和解析输入图像及其对应的文本问题,进而提供与问题相关的自然语言答案,已成为跨模态分析领域一个前景广阔的研究方向。现有工作极大程度上依赖于数据集的一些因素,如伪相关、数据集偏差和捷径学习,都对算法鲁棒性带来了极大的挑战。现有基于集成学习的方法通过训练偏差模型捕捉数据集偏差,但由于偏差模型对偏差样本的识别能力不足,导致其难以充分学习偏差信息,进而削弱去偏效果。为了增强偏差模型学习数据集偏差的能力,本文针对 VQA 任务提出了一种自适应偏差学习网络(命名为 ABLNet)。ABLNet 的核心设计包括: 首先,提出了一种自适应的样本重加权机制,基于每个样本的梯度信息动态分配权重,从而增强模型对数据集中偏差特征的学习,提升模型的泛化能力。其次,提出了一种基于受限学习的网络剪枝策略,通过限制偏差模型的学习能力,使其依赖于数据集中的表面相关性和偏差特征。在 VQA-CPv1、VQA-CPv2 和 VQA-v2 这些具有挑战性的 VQA 数据集上进行了大量实验,实验结果证明了我们方法的优越性。