作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于语义熵反馈强化学习的大语言模型事实性幻觉缓解

  • 发布日期:2025-06-03

Mitigating Factuality Hallucination in LLM with Semantic Entropy based Reinforcement Learning and Multi-Agent Collaboration

  • Published:2025-06-03

摘要: 大语言模型的事实性幻觉指的是模型生成内容与真实世界事实存在冲突的现象,这一问题显著降低了其在医疗、法律、科学研究等高风险领域的可信度与应用价值。现有的幻觉缓解方法主要依赖输入优化、监督学习或外部知识库,但这些方法存在泛化能力有限、对大规模标注数据依赖性强、实时性受限等问题,难以根本性提升模型的事实性偏好。为此,该文提出了一种基于语义熵反馈强化学习的事实性幻觉缓解框架。通过引入语义熵作为衡量模型语义级别不确定性的度量标准,该方法能够精准评估模型对自身生成内容的置信度,并将其作为奖励信号嵌入强化学习训练过程,使模型在生成过程中主动规避高幻觉风险的回答。相比于传统基于预测熵的方法,语义熵能够更有效地区分语义等价表达,并在无需外部知识库的情况下增强模型的事实性。在多个公开数据集上的实验表明,该文方法在保持生成内容丰富、连贯的基础上,相较于效果最好的基线模型,事实判断准确率最多提升5.7%,事实生成准确率最多提升7.8%,显著验证了其在事实性幻觉缓解方面的优越性。

Abstract: Large language models (LLMs)’s factual hallucination refers to the generation of content that conflicts with established real-world facts, significantly reducing model credibility and applicability in high-risk domains such as healthcare, law, and scientific research. Current methods for hallucination mitigation primarily depend on input optimization, supervised learning, or integration with external knowledge bases. However, these approaches exhibit limited generalizability, substantial dependence on extensive labeled datasets, and constraints in real-time scenarios, making it challenging to fundamentally improve the factual accuracy of LLMs. To address these limitations, this paper proposes a reinforcement learning-based framework incorporating semantic entropy as feedback to mitigate factual hallucinations. Semantic entropy serves as a precise measure of uncertainty at the semantic level, enabling an accurate assessment of the model's confidence in its generated responses. By embedding semantic entropy into the reinforcement learning process as a reward signal, the model is encouraged to proactively avoid responses with a high likelihood of hallucination. Compared to traditional predictive entropy-based methods, semantic entropy more effectively distinguishes semantically equivalent expressions and enhances factual judgment capabilities without reliance on external knowledge sources. Experimental results show that this paper’s method, while maintaining the richness and coherence of the generated content, can improve factual judgment accuracy by up to 5.7% and factual generation accuracy by up to 7.8%, compared to the best baseline model, significantly validating its superiority in factitious hallucination mitigation.