作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (6): 86-93. doi: 10.19678/j.issn.1000-3428.0068081

• 人工智能与模式识别 • 上一篇    下一篇

基于文本知识增强的问题生成模型

陈佳玉, 王元龙, 张虎   

  1. 山西大学计算机与信息技术学院, 山西 太原 030006
  • 收稿日期:2023-07-17 修回日期:2023-10-10 出版日期:2024-06-15 发布日期:2024-06-20
  • 通讯作者: 王元龙,E-mail:ylwang@sxu.edu.cn E-mail:ylwang@sxu.edu.cn
  • 基金资助:
    国家自然科学基金(62176145)。

Question Generation Model Based on Text Knowledge Enhancement

CHEN Jiayu, WANG Yuanlong, ZHANG Hu   

  1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China
  • Received:2023-07-17 Revised:2023-10-10 Online:2024-06-15 Published:2024-06-20

摘要: 预训练语言模型在大规模训练数据和超大规模算力的基础上,能够从非结构化的文本数据中学到大量的知识。针对三元组包含信息有限的问题,提出利用预训练语言模型丰富知识的问题生成方法。首先,利用预训练语言模型中丰富的知识增强三元组信息,设计文本知识生成器,将三元组中的信息转化为子图描述,丰富三元组的语义;然后,使用问题类型预测器预测疑问词,准确定位答案所在的领域,从而生成语义正确的问题,更好地控制问题生成的效果;最后,设计一种受控生成框架对关键实体和疑问词进行约束,保证关键实体和疑问词同时出现在问题中,使生成的问题更加准确。在公开数据集WebQuestion和PathQuestion中验证所提模型的性能。实验结果表明,与现有模型LFKQG相比,所提模型的BLUE-4、METEOR、ROUGE-L指标在WebQuestion数据集上分别提升0.28、0.16、0.22个百分点,在PathQuestion数据集上分别提升0.8、0.39、0.46个百分点。

关键词: 自然语言理解, 问题生成, 知识图谱, 预训练语言模型, 知识增强

Abstract: Pre-trained language models, which are trained on large-scale datasets with extensive computing power, can extract significant amounts of knowledge from unstructured text data. To address the limited information in current triplets, a method is proposed that utilizes pre-trained language models to enrich this knowledge. Initially, a textual knowledge generator is designed to enhance the semantics of the triplets by leveraging the extensive knowledge embedded in the pre-trained models. This generator transforms the information within the triplets into subgraph descriptions. Subsequently, a question type predictor is employed to determine the appropriate question words. These question words are essential for question generation as they help to locate the domain of the answer accurately, resulting in semantically coherent questions and enhanced control over the generation process. Finally, a controlled generation framework is developed to ensure that both key entities and question words appear in the generated questions, thereby increasing the accuracy of these questions. The efficacy of the proposed model is demonstrated on the public datasets WebQuestion and PathQuestion. When compared to the existing model LFKQG, the proposed model shows improvements in the BLUE-4, METEOR, and ROUGE-L metrics by 0.28, 0.16, and 0.22 percentage points, respectively, on the WebQuestion dataset, and by 0.8, 0.39, and 0.46 percentage points, respectively, on the PathQuestion dataset.

Key words: natural language understanding, question generation, Knowledge Graph(KG), pre-trained language model, knowledge enhancement

中图分类号: