基于视觉语言大模型的多模态虚假新闻检测

doi:10.19678/j.issn.1000-3428.0252354

摘要/Abstract

摘要： 随着社交媒体平台传播的信息量指数级增长，虚假新闻检测成为信息鉴伪领域的关键任务。当前研究方法聚焦于单一模态的语义分析，未能有效地解决多模态新闻的跨模态语义矛盾；且现有模型决策过程可信度不足，缺乏可解释的辅助信息支撑。针对上述问题，本研究提出一种面向多模态新闻的视觉语言大模型检测框架。首先，引入视觉语言大模型推理新闻内容，通过生成图文描述集来增强检测的可解释性；其次，设计多粒度协同注意力机制，实现文本、图像以及辅助描述的多粒度特征对齐。本研究采用多模态视觉语言大模型Qwen2.5-VL作为新闻解释性工具，设计新闻提示模板，引导模型对新闻图像提取关键对象和场景要素，利用模型的语言生成能力增强新闻文本的上下文，形成可解释的辅助决策依据。多粒度协同注意力融合机制以协同注意力层为基础，通过多层级特征交互，在高维语义空间中捕捉新闻图文中的潜在伪造模式。本研究在Weibo、GossipCop和Pheme多模态虚假新闻数据集进行实验，实验结果表明，在准确率上分别达到90.4%、99.7%和86.6%。

Abstract: With the exponential growth of information dissemination on social media platforms, false news detection has become a critical challenge in the field of information authenticity verification. Existing research methods focus on single-modal semantic analysis, which inadequately models cross-modal semantic contradictions in multimodal news, while also suffering from limited interpretability due to the absence of explainable auxiliary information. To address the above issues, this study proposes a multimodal fake news detection framework utilizing a large vision-language model. The framework introduces the following innovations: 1) utilization of the large vision-language model Qwen2.5-VL to reason over news content and generate multimodal description sets that enhance interpretability; 2) design of a multi-granularity co-attention mechanism to achieve cross-modal feature alignment across textual semantics, visual features, and auxiliary descriptions. We design news prompting templates to guide Qwen2.5-VL in extracting key objects, scene elements, and contextual semantic enhancements from news, thereby generating explainable auxiliary decision-making evidence. Based on co-attention layers, the multi-grained co-attention fusion mechanism employs hierarchical feature interactions to capture latent fake patterns in multimodal news within high-dimensional semantic spaces. This study conducted experiments on three multimodal fake news datasets Weibo, GossipCop and Pheme, and the experimental results showed that the accuracy rates reached 90.4%, 99.7% and 86.6% respectively.

林荣鑫, 李硕豪, 董力铭, 郝思齐. 基于视觉语言大模型的多模态虚假新闻检测[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252354.

Lin Rongxin, Li Shuohao, Dong Liming, Hao Siqi. Multimodal Fake News Detection Based on Large Vision-Language Model[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252354.

参考文献

[1] 张敏超,蒲秋梅,黄方俐.基于BERT-BiLSTM模型的虚假新闻检测[J].中国电子科学研究院学报,2025,20(01):33-40. ZHANG M C, PU Q M, HUANG F L. False News Detection Based on Bert-BiLSTM Model[J]. Journal of China Academy of Electronics and Information Technology, 2025,20(01) : 33-40.
[2] 王玉虎,刘伟.一种基于人机融合的态势认知模型[J].指挥与控制学报,2023,9(01):76-84. WANG Y H, LIU W. A Situation Cognition Model Based on Human-machine Hybrid Fusion[J]. Journal of Command and Control, 2023,9(1): 76-84.
[3] 钟善男,彭淑娟,柳欣,等.双分支线索深度感知与自适应协同优化的多模态虚假新闻检测[J].计算机学报,2023,46 (12):2612-2625. ZHONG S N, PENG S J, LIU L, et al. Multimodal Fake News Detection via Two-Branch Deep Clue Perception and Adaptive Collaborative Optimization[J]. Chinese Journal of Computers, 2023,46(12):2612-2625.
[4] 王震宇,朱学芳.基于多模态Transformer的虚假新闻检测研究[J].情报学报,2023,42(12):1477-1486. WANG Z Y, ZHU X F. Research on Fake News Detection Based on Multimodal Transformer [J]. Journal of the China Society for Scientific and Technical Information, 2023,42(12):1477-1486.
[5] 朱枫,张廷辉,李鹏,等.基于多模态自适应融合的短视频虚假新闻检测[J].计算机科学,2024,51(11):39-46. ZHU F, ZHANG T H, LI P, et al. Multimodal Adaptive Fusion Based Detection of Fake News in Short Video[J]. Computer Science,2024,51(11):39-46.
[6] Castillo C, Mendoza M, Poblete B. Information credibility on twitter[C]//Proceedings of the 20th international conference on World wide web. 2011: 675-684.
[7] Ma J, Gao W, Wei Z Y, et al. Detect Rumors Using Time Series of Social Context Information on Microblogging Websites[C]//Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 2015: 1751-1754.
[8] Cao J, Qi P, Sheng Q, et al. Exploring the role of visual content in fake news detection[J]. Disinformation, Misinformation, and Fake News in Social Media: Emerging Research Challenges and Opportunities, 2020: 141-161.
[9] Yang Y, Zheng L, Zhang J, et al. TI-CNN: Convolutional Neural Networks for Fake News Detection[J]. arXiv preprint, arXiv:1806.00749, 2018.
[10] Qi P, Cao J, Yang T. Exploiting multi-domain visual information for fake news detection[C]//Proceedings of the IEEE International Conference on Data Mining, 2019: 518-527.
[11] Zhou X, Wu J, Zafarani R. Safe: similarity-aware multi-modal fake news detection[J]. arxiv preprint arxiv:2003. 04981, 2020.
[12] Zhang H, Fang Q, Qian S, et al. Multi-modal knowledge-aware event memory network for social media rumor detection[C]//Proceedings of the 27th ACM International Conference on Multimedia, 2019:1942-1951.
[13] Qian S, Hu J, Fang Q, et al. Knowledge-aware multi-modal adaptive graph convolutional networks for fake news detection[J]. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2021, 17(3): 1-23.
[14] Xu W, Wu J, Liu Q, et al. Evidence-aware fake news detection with graph neural networks[C]//Proceedings of the ACM Web Conference 2022,2022:2501-2510.
[15] Wang P, Bai S, Tan S, et al. Qwen2-vl: Enhancing vision-language model's perception of the world at any resolution[J]. arXiv preprint arXiv:2409.12191, 2024
[16] Ding N, Tang Y, Fu Z, et al. GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task[C]//Companion Proceedings of the ACM on Web Conference 2025. 2025: 2056-2065.
[17] Chen M, Wei L, Cao H, et al. Can large language models understand content and propagation for misinformation detection: An empirical study[J]. arxiv preprint arxiv:2311.12699, 2023.
[18] Jin Z W, Cao J, Zhang Y D, et al. Novel visual and statistical image features for microblogs news verification[J]. IEEE Transactions on Multimedia, 2016, 19(3) : 598-608
[19] Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022: 10684-10695.
[20] Khattar D, Goud J S, Gupta M, et al. Mvae: Multimodal variational autoencoder for fake news detection[C]//The world wide web conference, 2019: 2915-2921.
[21] Wu Y, Zhan P, Zhang Y, et al. Multimodal fusion with co-attention networks for fake news detection[C]//Findings of the association for computational linguistics: ACL-IJCNLP 2021. 2021: 2560-2569.
[22] Dun Y, Tu K, Chen C, et al. Kan: Knowledge-aware attention network for fake news detection[C]//Proceedings of the AAAI conference on artificial intelligence. 2021, 35(1): 81-89.
[23] Qi P, Yan Z, Hsu W, et al. SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 13052-13062
[24] Xu Z, Zhang X, Li R, et al. Fakeshield: Explainable image forgery detection and localization via multi-modal large language models[J]. arxiv preprint arxiv:2410.02761, 2024.
[25] Hu B, Sheng Q, Cao J, et al. Bad actor, good advisor: Exploring the role of large language models in fake news detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(20): 22105-22113.
[26] Li X, Zhang Y, Malthouse E C. Large Language Model Agent for Fake News Detection[J]. arXiv preprint arXiv:2405.01593, 2024.
[27] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 Conference of the North. 2019: 4171-4186.
[28] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[29] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.
[30] Bai S, Chen K, Liu X, et al. Qwen2.5-vl technical report[J]. arxiv preprint arxiv:2502.13923, 2025.
[31] Lu J, Batra D, Parikh D, et al. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[J]. Advances in neural information processing systems, 2019, 32.
[32] Jin Z, Cao J, Guo H, et al. Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]//Proceedings of the 25th ACM International Conference on Multimedia, 2017: 795-816.
[33] Shu K, Mahudeswaran D, Wang S, et al. Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media[J]. Big data, 2020, 8(3): 171-188.
[34] Zubiaga A, Liakata M, Procter R. Exploiting context for rumour detection in social media[C]//Social Informatics: 9th International Conference, SocInfo 2017, Oxford, UK, September 13-15, 2017, Proceedings, Part I 9. Springer International Publishing, 2017: 109-123.
[35] Korbak T, Shi K, Chen A, et al. Pretraining language models with human preferences[C]//International Conference on Machine Learning. PMLR, 2023: 17506-17533.
[36] Singhal S, Shah R R, Chakraborty T, et al. Spotfake: A multi-modal framework for fake news detection[C]//2019 IEEE fifth international conference on multimedia big data (BigMM). IEEE, 2019: 39-47.
[37] Singhal S, Kabra A, Sharma M, et al. Spotfake+: A multimodal framework for fake news detection via transfer learning (student abstract)[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(10): 13915-13916.
[38] Chen Y, Li D, Zhang P, et al. Cross-modal ambiguity learning for multimodal fake news detection[C]//Proceedings of the ACM web conference 2022. 2022: 2897-2905.
[39] Peng L W, Jian S L,Kan Z G,et al. Not all fake news is semantically similar: Contextual semantic representation learning for multimodal fake news detection[J]. Information Processing & Management,2024,Vol.61(1): 103564.
[40] Ying Q, Hu X, Zhou Y, et al. Bootstrapping Multi-View Representations for Fake News Detection [C]//Proceedings of the AAAI conference on Artificial Intelligence. 2023, 37(4): 5384-5392.
[41] Yu X, Sheng Z, Lu W, et al. Racmc: Residual-aware compensation network with multi-granularity constraints for fake news detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 39(1): 986-994.
[42] Qian S, Wang J, Hu J, et al. Hierarchical multi-modal contextual attention network for fake news detection[C]//Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2021: 153-162.

选择文件类型/文献管理软件名称

选择包含的内容