作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于视觉语言大模型的多模态虚假新闻检测

  • 发布日期:2025-08-28

Multimodal Fake News Detection Based on Large Vision-Language Model

  • Published:2025-08-28

摘要: 随着社交媒体平台传播的信息量指数级增长,虚假新闻检测成为信息鉴伪领域的关键任务。当前研究方法聚焦于单一模态的语义分析,未能有效地解决多模态新闻的跨模态语义矛盾;且现有模型决策过程可信度不足,缺乏可解释的辅助信息支撑。针对上述问题,本研究提出一种面向多模态新闻的视觉语言大模型检测框架。首先,引入视觉语言大模型推理新闻内容,通过生成图文描述集来增强检测的可解释性;其次,设计多粒度协同注意力机制,实现文本、图像以及辅助描述的多粒度特征对齐。本研究采用多模态视觉语言大模型Qwen2.5-VL作为新闻解释性工具,设计新闻提示模板,引导模型对新闻图像提取关键对象和场景要素,利用模型的语言生成能力增强新闻文本的上下文,形成可解释的辅助决策依据。多粒度协同注意力融合机制以协同注意力层为基础,通过多层级特征交互,在高维语义空间中捕捉新闻图文中的潜在伪造模式。本研究在Weibo、GossipCop和Pheme多模态虚假新闻数据集进行实验,实验结果表明,在准确率上分别达到90.4%、99.7%和86.6%。

Abstract: With the exponential growth of information dissemination on social media platforms, false news detection has become a critical challenge in the field of information authenticity verification. Existing research methods focus on single-modal semantic analysis, which inadequately models cross-modal semantic contradictions in multimodal news, while also suffering from limited interpretability due to the absence of explainable auxiliary information. To address the above issues, this study proposes a multimodal fake news detection framework utilizing a large vision-language model. The framework introduces the following innovations: 1) utilization of the large vision-language model Qwen2.5-VL to reason over news content and generate multimodal description sets that enhance interpretability; 2) design of a multi-granularity co-attention mechanism to achieve cross-modal feature alignment across textual semantics, visual features, and auxiliary descriptions. We design news prompting templates to guide Qwen2.5-VL in extracting key objects, scene elements, and contextual semantic enhancements from news, thereby generating explainable auxiliary decision-making evidence. Based on co-attention layers, the multi-grained co-attention fusion mechanism employs hierarchical feature interactions to capture latent fake patterns in multimodal news within high-dimensional semantic spaces. This study conducted experiments on three multimodal fake news datasets Weibo, GossipCop and Pheme, and the experimental results showed that the accuracy rates reached 90.4%, 99.7% and 86.6% respectively.