作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 289-295. doi: 10.19678/j.issn.1000-3428.0066412

• 开发研究与工程应用 • 上一篇    下一篇

基于预训练和多模态融合的假新闻检测

周昊玮1, 刘勇1,*(), 玄萍1,2,*()   

  1. 1. 黑龙江大学计算机科学与技术学院, 黑龙江 哈尔滨 150080
    2. 汕头大学工学院计算机科学与技术系, 广东 汕头 515063
  • 收稿日期:2022-12-01 出版日期:2024-01-15 发布日期:2024-02-21
  • 通讯作者: 刘勇, 玄萍
  • 基金资助:
    国家自然科学基金(61972135); 黑龙江省自然科学基金(LH2020F043)

Fake News Detection Based on Pre-Training and Multi-Modal Fusion

Haowei ZHOU1, Yong LIU1,*(), Ping XUAN1,2,*()   

  1. 1. School of Computer Science and Technology, Heilongjiang University, Harbin 150080, Heilongjiang, China
    2. Department of Computer Science and Technology, School of Engineering, Shantou University, Shantou 515063, Guangdong, China
  • Received:2022-12-01 Online:2024-01-15 Published:2024-02-21
  • Contact: Yong LIU, Ping XUAN

摘要:

现有的多模态检测模型通常对每个模态的特征进行简单拼接,不能对模态之间的相关性进行有效建模,而且很难迁移到标签稀少的领域。提出一种基于预训练和多模态融合的假新闻检测模型PMFD。提取新闻附带图像不同区域的特征作为图像原始向量,合并图像原始向量作为图像引导向量,设计早期融合、中期融合、后期融合3种不同的多模态融合方式。在早期融合阶段,通过图像引导向量初始化文本特征提取器,获取文本原始向量,合并文本原始向量作为文本引导向量。在中期融合阶段,使用模态的原始向量集合与其他模态的引导向量构造模态的特征表示。在后期融合阶段,融合不同模态的特征表示,构造新闻的特征表示。为提高模型的泛化能力,在标签丰富的数据上对PMFD进行预训练,然后再在标签稀少的数据上对PMFD进行微调。在公开数据集上的实验结果表明,PMFD能有效检测假新闻结果,相对传统模型CNN、LSTM、BERT等有10%以上的提升,相对EANN、M_model多模态假新闻检测模型有2%~3%的提升。

关键词: 假新闻检测, 预训练, 多模态融合, 引导向量, 跨模态共享特征, 阶段融合

Abstract:

Existing multi-modal detection models are typically characterized by a simple splicing of features from each modality and are often ineffective in modeling the correlation between modalities. Furthermore, the migration of these models to domains with sparse labels is challenging. In this paper, a PMFD model, based on pre-training and multi-modal fusion, is proposed. Initially, image raw vectors are extracted from different regions of news incidental images, which are then merged to form image guide vectors. Three distinct multimodal fusion methods are designed: early fusion, middle fusion, and post fusion. During early fusion, the text feature extractor is initialized with image bootstrap vectors, leading to the acquisition of text original vectors, which are subsequently merged into text bootstrap vectors. In the middle fusion stage, the feature representation of the modality is constructed using the modality's original vectors combined with the bootstrap vectors of other modalities. For post fusion, the feature representations of different modalities are fused to construct the feature representation of news. To enhance the model's generalization capability, PMFD is initially pre-trained on label-rich data and then fine-tuned on label-sparse data. Experimental results on public data set show that, this approach demonstrates an improvement of over 10% compared to traditional models, including CNN, LSTM, and BERT, and a 2%-3% enhancement over existing EANN, M_model multi-modal fake news detection models.

Key words: fake news detection, pre-training, multi-modal fusion, bootstrap vector, cross-modal shared feature, stage fusion