作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

多模态虚假信息检测综述

  • 发布日期:2026-04-01

Review of Multimodal False Information Detection

  • Published:2026-04-01

摘要: 数字时代下,文本、图像、音频等模态的复杂交互形成了多模态虚假信息,其传播速度与隐蔽程度远超传统单模态虚假信息,对信息安全与社会治理构成严峻挑战。但在国内,该领域相关研究较为匮乏,尚未形成完整体系。为此研究系统梳理了多模态虚假信息检测领域的研究现状及发展脉络,对多模态虚假信息检测的研究进行了全面总结。在明确多模态虚假信息检测的核心概念与任务谱系的基础上,详细总结了数据集与测评指标特征,分析了SAFE、CAFE、CFFN、SSA-MFND、PSCC-Net、DGM4、CCN、SNIFFER、KGAlign等不同多模态方法模型的适用场景与检测性能,归纳了跨模态一致性、异常特征识别、外部事实驱动三大核心检测方法,并且对多模态虚假信息检测的可解释性与泛化鲁棒性进行了探讨。同时,随着大规模视觉语言模型LVLM的崛起,其在多模态虚假信息检测中的应用不断深化,对此研究梳理了LVLM在该领域的多种应用场景、优势与局限。最后展望了多模态虚假信息检测的未来研究方向,以期为多模态虚假信息检测领域的发展提供借鉴与启示。

Abstract: In the digital era, the complex interactions between modalities such as text, images, and audio have given rise to multimodal misinformation. Its propagation speed and concealment level far exceed those of traditional unimodal misinformation, posing severe challenges to information security and social governance. However, research in this field is relatively scarce in China, and a comprehensive framework has yet to be established. Therefore, this study systematically reviews the research status and development trajectory of multimodal misinformation detection, providing a comprehensive summary of this field. Based on a clear understanding of the core concepts and task spectrum of multimodal misinformation detection, the study details the characteristics of datasets and evaluation metrics. It also analyzes the applicability and detection performance of different multimodal methods and models, such as SAFE, CAFE, CFFN, SSA-MFND, PSCC-Net, DGM4, CCN, SNIFFER, and KGAlign. The study summarizes three core detection methods: cross-modal consistency, anomaly feature recognition, and external fact-driven approaches. Furthermore, it explores the interpretability and generalization robustness of multimodal misinformation detection. With the rise of large-scale visual-language models (LVLM), their application in multimodal misinformation detection is continuously deepening. This study reviews various application scenarios, advantages, and limitations of LVLMs in this domain. Finally, the paper outlines future research directions in multimodal misinformation detection, aiming to provide insights and inspiration for the further development of this field.