Abstract:
In recent years, large language models have demonstrated exceptional performance in natural language processing tasks. However, in domain-specific question answering tasks such as those in the medical field, lightweight large language models lack sufficient support from vertical domain knowledge, resulting in deficiencies in the reliability and accuracy of their generated outputs. To enhance the accuracy of lightweight large language models in medical question answering tasks, this paper proposes a knowledge graph-enhanced medical question answering approach for large language models based on entity recognition and knowledge filtering, named ERKF-MedQA. This approach mainly consists of two components: precise initial entity recognition and knowledge filtering. Entity recognition is implemented using a multi-stage prompting method. First, entity normalization retrieval is performed on the input question. Then, relevance assessment is conducted on the retrieved entities to determine the final valid entities. Knowledge filtering is accomplished using the Multi-Task Semantic Scoring Model (M-TSSM). This model integrates question and path information, scores the initially retrieved knowledge, and filters out the knowledge highly relevant to the question. Finally, the filtered relevant knowledge is integrated into prompts and input into the large language model, which then performs reasoning and generates answers. Experimental results show that the proposed method outperforms all baseline models in terms of BERTScore. Compared with the best-performing baseline model, the proposed method achieves improvements of 0.44%, 0.25%, and 0.34% in Precision, Recall, and F1-Score, respectively.
摘要: 近年来,大语言模型在自然语言处理任务中展现出卓越性能。然而,在医学等专业领域的问答任务中,由于轻量级大语言模型缺乏足够的垂直领域知识支撑,其生成的结果在可靠性与准确性方面仍存在不足。为提升轻量级大语言模型在医学领域问答任务中的准确性,本文提出了一种基于实体识别与知识筛选的知识图谱增强大语言模型医学问答方法(ERKF-MedQA),该方法主要包括两个部分:精确的初始实体识别与知识筛选。实体识别采用多阶段提示的方法,首先在问题当中进行实体标准化检索,然后再对检索的实体进行相关性评估,以确定最终的有效实体。知识筛选采用多任务语义评分模型(M-TSSM)完成,该模型融合问题与路径信息,对初步检索的知识进行评分,筛选出与问题高度相关的知识。最后将筛选出的相关知识整合后以提示词的方式输入至大语言模型,由模型完成推理并生成答案。实验结果表明,所提方法在BERTScore指标上均优于各基线模型,且与效果最优的基线模型相比,Precision、Recall和F1_Score分别提升了0.44%、0.25%和0.34%。
Wang Huiyong, Zhou Rumeng, Zhang Yi, Feng Tao, Zhang Xiaoming. Knowledge Graph-Enhanced Large Model Question Answering Method Based on Entity Recognition and Knowledge Filtering[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252996.
王会勇, 周茹梦, 张屹, 冯涛, 张晓明. 基于实体识别与知识筛选的知识图谱增强大模型问答方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252996.