Rumor Detection Based on Large-Model Data Augmentation and Multi-Granularity Feature Fusion

doi:10.19678/j.issn.1000-3428.0260022

Abstract

Abstract: With the rapid development of the internet and social media, the speed of information generation and dissemination has reached an unprecedented level. The proliferation of misinformation, rumors, and other misleading content has become increasingly prominent, posing significant threats to social governance order, harmony, and stability. In rumor detection, the low proportion of rumor samples leads to data imbalance, while existing text augmentation techniques struggle to enhance detection performance due to their lack of specificity to rumor styles and low generation quality. Additionally, although pre-trained language models excel at capturing global dependencies in text, they often fall short in focusing on key local features of rumors. To address these challenges, this study proposes a rumor detection framework based on large-model data augmentation and multi-granularity feature fusion. First, a rumor generation method integrating a rumor-style lexicon and large language models is proposed. Based on publicly available rumor datasets, a style lexicon is constructed to guide large language models in generating semantically coherent and rumor-style consistent minority-class samples. This approach alleviates data imbalance while ensuring the quality of augmented samples. Second, this study introduces a multi-granularity contextual feature extractor. It combines the strengths of pre-trained language models with disentangled attention mechanisms in capturing global dependencies and the focus of convolutional sub-layers on local features. This enables the simultaneous capture of long-distance logical associations and fine-grained linguistic clues in rumor semantics, effectively mitigating the inherent limitations of such pre-trained models in capturing key local features. Experimental results demonstrate that the proposed detection method achieves accuracy rates of 82.24% and 93.91% on the BuzzFeed and PolitiFact datasets, respectively.

摘要： 随着网络和社交媒体的快速发展，信息的生成和传播速度达到了前所未有的水平，虚假信息、谣言及其他误导性内容充斥的现象愈加突出，这类问题已对社会治理秩序、和谐稳定构成重大威胁。谣言检测中，谣言样本占比低导致数据不平衡，现有文本增强技术因缺乏谣言风格针对性、生成质量低，难以提升检测效果；同时，预训练语言模型虽擅长捕捉文本全局依赖，却难聚焦谣言关键局部特征。为解决这些挑战，本研究提出了一种基于大模型数据增强的多粒度特征融合的谣言检测框架。首先，提出融合谣言风格词典与大语言模型的谣言生成方法，基于公开谣言数据集构建风格词典，以词典为约束指导大语言模型生成语义连贯且符合谣言风格的少数类样本，在缓解数据不平衡问题的同时保障增强样本质量。其次，本研究提出多粒度上下文特征提取器，融合基于解耦注意力机制的预训练语言模型在全局依赖捕捉上的优势，与卷积子层对局部特征的聚焦能力，实现对谣言语义长距离逻辑关联与细粒度语言线索的同步捕捉，有效弥补此类预训练模型在局部关键特征捕捉上的固有局限。实验结果证明，该检测方法在BuzzFeed 数据集和PolitiFact数据集准确率分别达到82.24%，93.91%。

LIANG Yu, MA Jiayan, HU Xiyuan , WANG Ziheng, LIU Wen, PENG Tianhao, LI Ying. Rumor Detection Based on Large-Model Data Augmentation and Multi-Granularity Feature Fusion[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260022.

梁堉, 马佳妍, 胡晰远, 王子恒, 刘文, 彭天豪, 李莹. 基于大模型数据增强的多粒度特征融合谣言检测[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260022.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0260022

References

[1] Zhang K, Yu J, Shi H ,et al. Rumor Detection with Diverse Counterfactual Evidence[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Long Beach, CA, USA:LAssociation for Computing Machinery, 2023:1339-1349.
[2] Lin H Z, Ma J, Chen L L, et al. Detect Rumors in Microblog Posts for Low-Resource Domains via Adversarial Contrastive Learning[C]//Findings of the Association for Computational Linguistics: NAACL 2022. Seattle, United States: Association for Computational Linguistics, 2022: 2543–2556.
[3] Xiaoxiao Ma, Yuchen Zhang, Kaize Ding, et al. On Fake News Detection with LLM Enhanced Semantics Mining [C]//Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics, 2024: 508–521.
[4] 马满福, 陈嘉豪, 李勇, 等. 基于改进GAT的多特征融合谣言检测模型MFLAN[J].计算机工程, 2025, 51(8):181-189. MA Manfu, CHEN Jiahao, LI Yong, ZHANG Cong. Multi-Feature Fusion Rumor Detection Model MFLAN Based on Improved Graph Attention Network[J]. Computer Engineering, 2025, 51(8): 181-189.
[5] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[6] Yang, B., Wang, L., Wong, D. F., Chao, L. S., Tu, Z. Modeling Localness for Self-Attention Networks[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018: 4449–4458.
[7] Kim Y. Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2014: 1746-1751.
[8] Li P, Zhong P, Mao K, et al. ACT: an Attentive Convolutional Transformer for Efficient Text Classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Event: AAAI Press, 2021: 13261-13269.
[9] KWON S, CHA M, JUNG K, et al. Prominent features of rumor propagation in online social media[C].//2013 IEEE 13th international conference on data mining. Dallas: IEEE, 2013:1103-1108.
[10] Shrestha A, Spezzano F. Textual characteristics of news title and body to detect fake news: A reproducibility study[C]// Proceedings of the Advances in Information Retrieval: 43rd European Conference on IR Research. Cham, Switzerland: Springer, 2021:120–133.
[11] Shrestha A, Spezzano F, Gurunathan I. Multi-modal analysis of misleading political news[C]//Proceedings of the Disinformation in Open Online Media: Second Multidisciplinary International Symposium. Berlin, Germany: Springer, 2020:261–276.
[12] Ma J, Gao W, Mitra P, et al. Detecting rumors from microblogs with recurrent neural networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto,USA: AAAI Press, 2016:3818-3824.
[13] Hojjat Salehinejad, Sharan Sankar, Joseph Barfett, et al. Recent advances in recurrent neural networks[EB/OL]. [2025-08-20]. https://arxiv.org/abs/1801.01078.
[14] 许旻辰, 屈丹, 司念文, 彭思思, 陈雅淇. 社交媒体虚假信息检测技术研究综述[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0070287. XU Minchen, QU Dan, SI Nianwen, PENG Sisi, CHEN Yaqi. A Survey of the Technologies for Detecting Disinformation in Social Media[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0070287.
[15] Singh J P, Kumar A, Rana N P, et al. Attention-Based LSTM Network for Rumor Veracity Estimation of Tweets[J]. Information Systems Frontiers, 2022, 24(2): 459-474.
[16] Wang Y, Ma F, Jin Z, et al. EANN: Event adversarial neural networks for multi-modal fake news detection[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2018: 849-857.
[17] Singhal S, Shah R R, Chakraborty T, et al. Spotfake: A multi-modal framework for fake news detection[C]// 2019 IEEE fifth international conference on multimedia big data(BigMM). IEEE, 2019:39–47.
[18] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, Minnesota: Association for Computational Linguistics, 2019: 4171-4186.
[19] LIU Y, OTT M, GOYAL N, et al. Roberta: a robustly optimized bert pretraining approach[J]. arxiv preprint arxiv: 1907.11692, 2019.
[20] Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong, China: Association for Computational Linguistics, 2019: 6382–6388.
[21] Wang, Y., Ma, F. Y., Yang, Z. Q., et al. Rumor Detection with Adaptive Data Augmentation and Adversarial Training[J]. Journal of Artificial Intelligence Research, 2025, 82: 1175-1204.
[22] Rico Sennrich, Barry Haddow, Alexandra Birch. Improving Neural Machine Translation Models with Monolingual Data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Association for Computational Linguistics, 2016: 86–96.
[23] HUA J, CUI X, LI X, et al. Multimodal fake news detection through data augmentation-based contrastive learning[J]. Applied soft computing, 2023, 136(1): 110125-110133.
[24] Amjad M, Sidorov G, Zhila A. Data augmentation using machine translation for fake news detection in the Urdu language[C].//Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, France:European Language Resources Association, 2020: 2537–2542.
[25] Sanh V , Debut L , Chaumond J ,et al.DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter[J]. arxiv preprint arxiv:1910.01108,2019.
[26] SUYANTO S. Synonyms-based augmentation to improve fake news detection using bidirectional LSTM[C].//2020 8th International conference on information and communication technology. Yogyakarta, Indonesia: IEEE, 2020:1-5.
[27] Branco P, Torgo L, Ribeiro RP. A Survey of Predictive Modeling on Imbalanced Domains[J]. ACM Computing Surveys, 2016, 49(2): 31.
[28] He, H., Bai, Y., Garcia, E. A., et al. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [C]//Proceedings of the 2008 IEEE International Joint Conference on Neural Networks. Hong Kong, China: IEEE, 2008: 1322–1328.
[29] Cataltas M, Cicekli I, Baykan N A. Data Augmentation for Text Classification Using Autoencoders[J]. IEEE Access, 2025, 13:161594 - 161604
[30] Neshaei SP, Davis RL, Mejia-Domenzain P, et al. Bridging the Data Gap: Using LLMs to Augment Datasets for Text Classification[C]//Proceedings of the 18th International Conference on Educational Data Mining. Palermo, Italy: International Educational Data Mining Society, 2025: 119–132.
[31] 石宇, 于宁, 孙亚伟, 等. 基于元多任务提示学习的零样本谣言检测方法[J]. 北京邮电大学学报, 2024, 47(04): 77-82. Shi Yu, Yu Ning, Sun Yawei, et al. A Zero-Shot Rumor Detection Method Based on Meta Multi-Task Prompt Learning[J]. Journal of Beijing University of Posts and Telecommunications, 2024, 47(04): 77-82.
[32] Yichuan Li, Kaize Ding, Jianling Wang, et al. Empowering Large Language Models for Textual Data Augmentation[C]//Findings of the Association for Computational Linguistics. Bangkok, Thailand: Association for Computational Linguistics, 2024:12734–12751.
[33] 盛坤, 王中卿. 基于大语言模型和数据增强的通感隐喻分析[J]. 计算机应用, 2025, 45(03): 794-800. Sheng Kun, Wang Zhongqing. Synaesthetic Metaphor Analysis Based on Large Language Models and Data Augmentation[J]. Journal of Computer Applications, 2025, 45(03): 794-800.
[34] Jinyan Su, Claire Cardie, Preslav Nakov, et al. Adapting Fake News Detection to the Era of Large Language Models[C]//Findings of the Association for Computational Linguistics. Mexico City, Mexico: Association for Computational Linguistics, 2024:1473–1490.
[35] He P, Liu X, Gao J, Chen W. DeBERTa: Decoding-Enhanced BERT with Disentangled Attention[C]//International Conference on Learning Representations. 2021.
[36] Potthast M, Kiesel J, Reinartz K, et al. A stylometric inquiry into hyperpartisan and fake news[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia: Association for Computational Linguistics, 2018:231–240.
[37] Lai J, Yang X, Luo W, et al. RumorLLM: A Rumor Large Language Model-Based Fake-News-Detection Data-Augmentation Approach[J]. Applied Sciences, 2024, 14(8): 3532.
[38] TORSHIZI A S, GHAZIKHANI A. Automatic Twitter rumor detection based on LSTM classifier[C]//High-Performance Computing and Big Data Analysis. Cham: Springer, 2019: 291-300.
[39] ZENG Y, DING X, CAI B, et al. Exploring Large Language Models for Effective Rumor Detection on Social Media[C]//Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies. Albuquerque, New Mexico: Association for Computational Linguistics, 2025: 2537-2552.
[40] Tian, Z., Huang, J., He, Z., et al. LLM-based Rumor Detection via Influence Guided Sample Selection and Game-based Perspective Analysis [C]. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025: 28402–28414.
[41] Ari Holtzman, Jan Buys, Li Du, et al. The Curious Case of Neural Text Degeneration[J]. arxiv preprint arxiv:1904.09751,2019.
[42] Bengio Y, Ducharme R, Vincent P, and Janvin C. A neural probabilistic language model[J]. The Journal of Machine Learning Research, 2003, 3: 1137–1155.
[43] HUTTO C, GILBERT E. VADER: A parsimonious rule-based model for sentiment analysis of social media text[C]//Proceedings of the International AAAI Conference on Web and Social Media. 2014, 8(1): 216-225.

Please choose a citation manager

Content to export