作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (7): 112-122. doi: 10.19678/j.issn.1000-3428.0068020

• 人工智能与模式识别 • 上一篇    下一篇

引入知识增强和对比学习的知识图谱补全

刘娟, 段友祥*(), 陆誉翕, 张鲁   

  1. 中国石油大学(华东)计算机科学与技术学院, 山东 青岛 266580
  • 收稿日期:2023-07-06 出版日期:2024-07-15 发布日期:2024-07-23
  • 通讯作者: 段友祥
  • 基金资助:
    中央高校基本科研业务费专项资金(20CX05017A); 中石油重大科技项目(ZD2019-183-006)

Knowledge Graph Completion with Knowledge Enhancement and Contrastive Learning

Juan LIU, Youxiang DUAN*(), Yuxi LU, Lu ZHANG   

  1. College of Computer Science and Technology, China University of Petroleum(East China), Qingdao 266580, Shandong, China
  • Received:2023-07-06 Online:2024-07-15 Published:2024-07-23
  • Contact: Youxiang DUAN

摘要:

知识图谱补全是提高知识图谱质量的重要手段, 主要分为基于结构和基于描述的方法。基于结构的补全方法对图谱中常见的长尾实体推理性能表现不佳, 基于描述的补全方法在描述信息利用和负样本信息学习方面存在不足。针对上述问题, 提出基于知识增强的知识图谱补全方法KEKGC。设计一种特定模板, 将三元组及其描述信息通过人工定义的模板转换为连贯的自然语言描述语句输入预训练语言模型, 增强语言模型对三元组结构知识与描述知识的理解能力。在此基础上, 提出一种对比学习框架来提高链接预测任务的效率与准确率, 通过建立记忆库存储实体嵌入向量, 从中选择正负样本并结合InfoNCE损失进行训练。实验结果显示, 相较于MEM-KGC, KEKGC在WN18RR数据集上链接预测任务的平均倒数秩(MRR)提升了5.5, Hits@1、Hits@3、Hits@10指标分别提升了2.8、0.7、4.2个百分点, 三元组分类任务准确率达到94.1%, 表明所提方法具有更高的预测准确率与更好的泛化能力, 尤其对于长尾实体, 能够有效提升图谱补全的效果与效率。

关键词: 知识图谱, 预训练语言模型, 链接预测, 对比学习, 实体描述

Abstract:

Knowledge Graph Completion(KGC) is an important means of improving the quality of KGs. Existing methods for KGC are mainly divided into structure- and description-based methods. The structure-based methods perform poorly in the inference of long-tailed entities commonly found in KGs, and the description-based methods are insufficient for descriptive information utilization and negative sample information learning. To address these challenges, this paper proposes a KGC method with knowledge enhancement and contrastive learning, named KEKGC. A specific template is designed to convert triples and their descriptive information into coherent natural language statements. These statements serve as input to a Pre-trained Language Model(PLM) through a manually defined template, enhancing the language model's comprehension of the structural and descriptive knowledge of each triple. On this basis, a contrastive learning framework is incorporated to improve the efficiency and accuracy of the link prediction task. This is done by building a memory bank to store entity embedding vectors, from which positive and negative samples are selected and trained with InfoNCE loss. Experimental results on WN18RR dataset show that compared with MEM-KGC, in the link prediction task, KEKGC improves Mean Reciprocal Rank(MRR) by 5.5, improves Hits@1, Hits@3, and Hits@10 metrics by 2.8, 0.7, and 4.2 percentage points, respectively, and the accuracy of the triple classification task reaches 94.1%. Hence, this method can achieve higher prediction accuracy and better generalization ability, especially for long-tailed entities, which can effectively improve graph completion.

Key words: Knowledge Graph(KG), Pre-trained Language Model(PLM), link prediction, contrastive learning, entity description