Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Offensive Meme Detection Method Via Cross-Modal Meta-Learning With Unimodal Rectification

  

  • Online:2026-05-15 Published:2026-05-15

融合跨模态元学习与单模态修正的攻击性模因检测

Abstract: With the proliferation of digital platforms, the forms of offensive memes have become increasingly complex and diverse. This phenomenon has exacerbated the scarcity of high-quality annotated data, making modal semantic alignment bias under small-sample conditions a core issue constraining detection performance. To address this issue, this study proposes an offensive meme detection method via Cross-Modal Meta-Learning with Unimodal Rectification(CMML-UR). This method uses a cross-modal dual-gradient meta-learning framework, leverages hierarchical image features to provide multi-level visual semantics, and combines them with low-noise textual representations generated through multi-regularized modeling. At the decision fusion stage, by evaluating the output confidence of each modality at the sample level, the method introduces a unimodal confidence-gated rectification mechanism to dynamically calibrate the final prediction. Experimental results on the MultiOFF dataset demonstrate that the proposed method achieves a weighted F1-score of 74.6%, which is an improvement of 4.3 percentage points over the state-of-the-art (SOTA) model. In few-shot generalization tests, it maintains a weighted F1-score of 69.3% (5.6 percentage points higher than the baseline model at 63.7%), verifying its efficiency in complex cross-modal semantic understanding and robustness in noise suppression within few-shot scenarios.

摘要: 随着数字平台的普及,攻击性模因的形态日趋复杂多样,加剧了高质量标注数据的匮乏,使得小样本条件下的模态语义对齐偏差成为制约检测性能的核心问题。为此,提出融合跨模态元学习与单模态修正的攻击性模因检测方法(CMML-UR)。所提方法首先设计跨模态双梯度元学习框架,利用图像粗细粒度分层特征所提供的多层次视觉语义,结合多正则文本建模生成的低噪声文本表征,实现跨模态语义的稳定对齐与快速适应,提升小样本下的泛化能力。在决策融合阶段,进一步引入单模态置信度门控修正机制,基于对各模态输出置信度的样本级评估,自适应抑制不可靠模态噪声,并对预测结果进行动态校准。实验结果表明,所提方法在MultiOFF数据集上的加权F1值达到74.6%,较SOTA模型提升4.3个百分点,在小样本泛化性实验中加权F1值仍保持69.3%,较基线模型(63.7%)提升5.6个百分点,充分验证了其在小样本场景下复杂跨模态语义理解的高效性与噪声抑制的鲁棒性。