Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Rumor Detection Based on Large-Model Data Augmentation and Multi-Granularity Feature Fusion

  

  • Published:2026-04-20

基于大模型数据增强的多粒度特征融合谣言检测

Abstract: With the rapid development of the internet and social media, the speed of information generation and dissemination has reached an unprecedented level. The proliferation of misinformation, rumors, and other misleading content has become increasingly prominent, posing significant threats to social governance order, harmony, and stability. In rumor detection, the low proportion of rumor samples leads to data imbalance, while existing text augmentation techniques struggle to enhance detection performance due to their lack of specificity to rumor styles and low generation quality. Additionally, although pre-trained language models excel at capturing global dependencies in text, they often fall short in focusing on key local features of rumors. To address these challenges, this study proposes a rumor detection framework based on large-model data augmentation and multi-granularity feature fusion. First, a rumor generation method integrating a rumor-style lexicon and large language models is proposed. Based on publicly available rumor datasets, a style lexicon is constructed to guide large language models in generating semantically coherent and rumor-style consistent minority-class samples. This approach alleviates data imbalance while ensuring the quality of augmented samples. Second, this study introduces a multi-granularity contextual feature extractor. It combines the strengths of pre-trained language models with disentangled attention mechanisms in capturing global dependencies and the focus of convolutional sub-layers on local features. This enables the simultaneous capture of long-distance logical associations and fine-grained linguistic clues in rumor semantics, effectively mitigating the inherent limitations of such pre-trained models in capturing key local features. Experimental results demonstrate that the proposed detection method achieves accuracy rates of 82.24% and 93.91% on the BuzzFeed and PolitiFact datasets, respectively.

摘要: 随着网络和社交媒体的快速发展,信息的生成和传播速度达到了前所未有的水平,虚假信息、谣言及其他误导性内容充斥的现象愈加突出,这类问题已对社会治理秩序、和谐稳定构成重大威胁。谣言检测中,谣言样本占比低导致数据不平衡,现有文本增强技术因缺乏谣言风格针对性、生成质量低,难以提升检测效果;同时,预训练语言模型虽擅长捕捉文本全局依赖,却难聚焦谣言关键局部特征。为解决这些挑战,本研究提出了一种基于大模型数据增强的多粒度特征融合的谣言检测框架。首先,提出融合谣言风格词典与大语言模型的谣言生成方法,基于公开谣言数据集构建风格词典,以词典为约束指导大语言模型生成语义连贯且符合谣言风格的少数类样本,在缓解数据不平衡问题的同时保障增强样本质量。其次,本研究提出多粒度上下文特征提取器,融合基于解耦注意力机制的预训练语言模型在全局依赖捕捉上的优势,与卷积子层对局部特征的聚焦能力,实现对谣言语义长距离逻辑关联与细粒度语言线索的同步捕捉,有效弥补此类预训练模型在局部关键特征捕捉上的固有局限。实验结果证明,该检测方法在BuzzFeed 数据集和PolitiFact数据集准确率分别达到82.24%,93.91%。