基于语义熵反馈强化学习的大语言模型事实性幻觉缓解

doi:10.19678/j.issn.1000-3428.0252253

摘要/Abstract

摘要： 大语言模型的事实性幻觉指的是模型生成内容与真实世界事实存在冲突的现象，这一问题显著降低了其在医疗、法律、科学研究等高风险领域的可信度与应用价值。现有的幻觉缓解方法主要依赖输入优化、监督学习或外部知识库，但这些方法存在泛化能力有限、对大规模标注数据依赖性强、实时性受限等问题，难以根本性提升模型的事实性偏好。为此，该文提出了一种基于语义熵反馈强化学习的事实性幻觉缓解框架。通过引入语义熵作为衡量模型语义级别不确定性的度量标准，该方法能够精准评估模型对自身生成内容的置信度，并将其作为奖励信号嵌入强化学习训练过程，使模型在生成过程中主动规避高幻觉风险的回答。相比于传统基于预测熵的方法，语义熵能够更有效地区分语义等价表达，并在无需外部知识库的情况下增强模型的事实性。在多个公开数据集上的实验表明，该文方法在保持生成内容丰富、连贯的基础上，相较于效果最好的基线模型，事实判断准确率最多提升5.7%，事实生成准确率最多提升7.8%，显著验证了其在事实性幻觉缓解方面的优越性。

Abstract: Large language models (LLMs)’s factual hallucination refers to the generation of content that conflicts with established real-world facts, significantly reducing model credibility and applicability in high-risk domains such as healthcare, law, and scientific research. Current methods for hallucination mitigation primarily depend on input optimization, supervised learning, or integration with external knowledge bases. However, these approaches exhibit limited generalizability, substantial dependence on extensive labeled datasets, and constraints in real-time scenarios, making it challenging to fundamentally improve the factual accuracy of LLMs. To address these limitations, this paper proposes a reinforcement learning-based framework incorporating semantic entropy as feedback to mitigate factual hallucinations. Semantic entropy serves as a precise measure of uncertainty at the semantic level, enabling an accurate assessment of the model's confidence in its generated responses. By embedding semantic entropy into the reinforcement learning process as a reward signal, the model is encouraged to proactively avoid responses with a high likelihood of hallucination. Compared to traditional predictive entropy-based methods, semantic entropy more effectively distinguishes semantically equivalent expressions and enhances factual judgment capabilities without reliance on external knowledge sources. Experimental results show that this paper’s method, while maintaining the richness and coherence of the generated content, can improve factual judgment accuracy by up to 5.7% and factual generation accuracy by up to 7.8%, compared to the best baseline model, significantly validating its superiority in factitious hallucination mitigation.

顾滢双, 桂韬, 张奇. 基于语义熵反馈强化学习的大语言模型事实性幻觉缓解[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252253.

GU Yingshuang , GUI Tao , ZHANG Qi. Mitigating Factuality Hallucination in LLM with Semantic Entropy based Reinforcement Learning and Multi-Agent Collaboration[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252253.

参考文献

[1] ACHIAM J, ADLER S, AGARWAL S, et al. Gpt-4 technical report[J]. arXiv preprint arXiv:2303.08774, 2023.
[2] 李博,季佰军,段湘煜.基于译文易错词纠正机制的大语言模型机器翻译 [J/OL]. 计算机工程 ,1-13[2025-03-22].https://doi.org/10.19678/j.issn.1000- 3428.0069767. LI B, JI B J, DUAN X Y. Large language model machine translation based on a mechanism for correcting easily mistaken words in translations[J/OL]. Computer Engineering: 1-13[2025-03-22].https://doi.org/10.19678/j.issn.1000-34 28.0069767.
[3] Xi Z, CHEN W, GUO X, et al. The rise and potential of large language model based agents: A survey[J]. Science China Information Sciences, 2025, 68(2): 121101.
[4] 罗焕坤, 葛一烽, 刘帅. 大语言模型在数学推理中的研究进展 [J]. 计算机工程 , doi: 10.19678/j.issn.1000-3428.0069590. LUO H K, GE Y F , LIU S. Research Progress of Large Language Models in Mathematical Reasoning[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0069590.
[5] ZHANG X, TIAN C, YANG X, et al. Alpacare: Instruction-tuned large language models for medical application[J]. arXiv preprint arXiv:2310.14558, 2023.
[6] 沈晨晨, 岳盛斌, 刘书隽, 等. 面向法律领域的大模型微调与应用[J]. 大数据, 2024,10(5):12-27. SHEN C C, YUE S B, LIU S J, et al. Fine-tuning and applications of large models for legal domain[J]. Big Data Research, 2024,10(5):12-27.
[7] JI Z, LEE N, FRIESKE R, et al. Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12): 1-38.
[8] ZHANG Y, LI Y, CUI L, et al. Siren's song in the AI ocean: a survey on hallucination in large language models[J]. arXiv preprint arXiv: 2309.01219, 2023.
[9] RAWTE V, SHETH A, DAS A. A survey of hallucination in large foundation models[J]. arXiv preprint arXiv:2309.05922, 2023.
[10] HUANG L, YU W, MA W, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-55.
[11] 刘泽垣,王鹏江,宋晓斌,等.大语言模型的幻觉问题研究综述 [J]. 软件学报 ,2025,36(03):1152-1185. DOI:10.13328/j.cnki.jos.007242. LIU Z Y, WANG P J, SONG X B, et al. A Survey on hallucination problem in large language models[J]. Journal of Software,2025,36(03):1152-1185. DOI:10.13328/j.cnki.jos.007242.
[12] LINDLEY D V. On a measure of the information provided by an experiment[J]. The Annals of Mathematical Statistics, 1956, 27(4): 986-1005.
[13] FARQUHAR S, KOSSEN J, KUHN L, et al. Detecting hallucinations in large language models using semantic entropy[J]. Nature, 2024, 630(8017): 625-630.
[14] THORNE J, VLACHOS A, CHRISTODOULOPOULOS C, et al. FEVER: a large-scale dataset for fact extraction and verification[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 809-819. [15] LIN S, HILTON J, EVANS O. TruthfulQA: Measuring how models mimic human falsehoods[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022: 3214-3252.
[16] HENDRYCKS D, BURNS C, BASART S, et al. Measuring Massive Multitask Language Understanding[C]//Proceedings of the International Conference on Learning Representations, 2021.
[17] DU Y, LI S, TORRALBA A, et al. Improving Factuality and Reasoning in Language Models through Multiagent Debate[C]//International Conference on Machine Learning. PMLR, 2024: 11733-11763.
[18] TONMOY S M, ZAMAN S M, JAIN V, et al. A comprehensive survey of hallucination mitigation techniques in large language models[J]. arXiv preprint arXiv:2401.01313, 2024.
[19] WHITE J, FU Q, HAYS S, et al. A prompt pattern catalog to enhance prompt engineering with chatgpt[J]. arXiv preprint arXiv:2302.11382, 2023.
[20] SHUSTER K, POFF S, CHEN M, et al. Retrieval augmentation reduces hallucination in conversation[C]//Findings of the Association for Computational Linguistics: EMNLP 2021. 2021: 3784-3803.
[21] DHULIAWALA S, KOMEILI M, XU J, et al. Chain-of-Verification Reduces Hallucination in LargeLanguage Models[C]//Findings of the Association for Computational Linguistics ACL 2024. 2024: 3563-3578.
[22] SHI W, HAN X, LEWIS M, et al. Trusting your evidence: hallucinate less with context-aware decoding[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2024: 783-791.
[23] BAYAT F F, QIAN K, HAN B, et al. FLEEK: Factual error detection and correction with evidence retrieved from external knowledge [C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations,2023: 124-130.
[24] CHRISTIANO P F, LEIKE J, BROWN T B, et al. Deep reinforcement learning from human preferences[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems,2017: 4302-4310.
[25] BRADLEY R A,TERRY M E. Rank analysis of incomplete block designs: the method of paired comparisons[J]. Biometrika, 1952,39(3-4):324-345.
[26] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.
[27] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[J]. Advances in neural information processing systems, 2022, 35: 27730-27744.
[28] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[J]. arXiv preprint arXiv:1506.02438, 2015.
[29] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine learning, 1988, 3: 9-44.
[30] MALON C. Team Papelo: Transformer networks at FEVER[C]//Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)，2018: 109-113.
[31] VASWANI A, SHAZER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017:5998-6008.
[32] LIN C Y. Rouge: A package for automatic evaluation of summaries[C]//Text summarization branches out，2004: 74-81.
[33] PAPINENI K, ROUKOS S, WARD T, et al. Bleu: A method for automatic evaluation of machine translation[C] //Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002: 311-318.
[34] ZHANG T, KISHORE V, WU F, et al. BERTScore: Evaluating Text Generation with BERT[C]//International Conference on Learning Representations, 2020.
[35] LI J, GALLEY M, BROCKETT C, et al. A Diversity-Promoting Objective Function for Neural Conversation Models[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 110-119.
[36] YOFFE L, AMAYUELAS A, WANG W Y. DebUnc: mitigating hallucinations in large language model agent communication with uncertainty estimations[J]. arXiv preprint arXiv:2407.06426, 2024.
[37] WANG X, WEI J, SCHUURMANS D, et al. Self-consistency improves chain of thought reasoning in language models[J]. arXiv preprint arXiv: 2203.11171, 2022.
[38] WEI J, WANG X, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[J]. Advances in neural information processing systems, 2022, 35: 24824-24837.
[39] SHINN N, CASSANO F, GOPINATH A, et al. Reflexion: Language agents with verbal reinforcement learning[J]. Advances in Neural Info
rmation Processing Systems, 2023, 36: 8634-8652. [40] CHUANG Y, XIE Y, LUO H, et al. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models[C]//Proceedings of the Twelfth International Conference on Learning Representations, 2024.
[41] WANG A, SONG L, PENG B, et al. Fine-Grained Self-Endorsement Improves Factuality and Reasoning[J]. CoRR, 2024.
[42] PENEDO G, MALARTIC Q, HESSLOW D, et al. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data only[J]. Advances in Neural Information Processing Systems, 2023, 36: 79155-79172.
[43] GLM T, ZENG A, XU B, et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools[J]. arXiv preprint arXiv:2406.12793, 2024.
[44] BAI J, BAI S, CHU Y, et al. Qwen technical report[J]. arXiv preprint arXiv:2309.16609, 2023.
[45] CUI Y, YANG Z, YAO X. Efficient and effective text encoding for chinese llama and alpaca[J]. arXiv preprint arXiv:2304.08177, 2023.
[46] TOUVRON H, MARTIN L, STONE K, et al. Llama 2: Open foundation and fine-tuned chat models[J]. arXiv preprint arXiv:2307.09288, 2023.
[47] HU E J, SHEN Y, WALLIS P, et al. Lora: Low-rank adaptation of large language models[J]. ICLR, 2022, 1(2): 3.
[48] WILLIAMS A, NANGIA N, BOWMAN S. A Broad-coverage challenge corpus for sentence understanding through inference[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018: 1112-1122.
[49] HE P, LIU X, GAO J, et al. Deberta: Decoding-enhanced bert with disentangled attention[J]. arXiv preprint arXiv:2006.03654, 2020.

选择文件类型/文献管理软件名称

选择包含的内容