作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (4): 168-176. doi: 10.19678/j.issn.1000-3428.0067543

• 人工智能与模式识别 • 上一篇    下一篇

基于对比学习与语言模型增强嵌入的知识图谱补全

张洪程1, 李林育2, 杨莉3, 伞晨峻1, 尹春林3, 颜冰1, 于虹3, 张璇2,4,5   

  1. 1. 云南电网有限责任公司政企部, 云南 昆明 650032;
    2. 云南大学软件学院, 云南 昆明 650091;
    3. 云南电网有限责任公司电力科学研究院, 云南 昆明 650217;
    4. 云南省软件工程重点实验室, 云南 昆明 650091;
    5. 跨境网络空间安全教育部工程研究中心, 云南 昆明 650091
  • 收稿日期:2023-05-04 修回日期:2023-07-04 发布日期:2024-04-15
  • 通讯作者: 张璇,E-mail:zhxuan@ynu.edu.cn E-mail:zhxuan@ynu.edu.cn
  • 基金资助:
    国家自然科学基金(61862063,61502413,61262025);云南电网有限责任公司创新项目(YNKJXM20222254);云南省中青-学术和技术带头人后备人才项目(202205AC160040);云南省院士专家工作站项目(202205AF150006);云南省科技计划重大专项计划项目(202202AE090066);云南省教育厅科学研究基金(2023Y0256);云南大学软件学院"知识驱动智能软件工程科研创新团队"项目。

Knowledge Graph Completion Based on Contrastive Learning and Language Model-Enhanced Embedding

ZHANG Hongchen1, LI Linyu2, YANG Li3, SAN Chenjun1, YIN Chunlin3, YAN Bing1, YU Hong3, ZHANG Xuan2,4,5   

  1. 1. Policy Research and Enterprise Management Department, Yunnan Power Grid Co., Ltd., Kunming 650032, Yunnan, China;
    2. School of Software, Yunnan University, Kunming 650091, Yunnan, China;
    3. Electric Power Research Institute, Yunnan Power Grid Co., Ltd., Kunming 650217, Yunnan, China;
    4. Key Laboratory of Software Engineering of Yunnan Province, Kunming 650091, Yunnan, China;
    5. Engineering Research Center of Cyberspace, Kunming 650091, Yunnan, China
  • Received:2023-05-04 Revised:2023-07-04 Published:2024-04-15

摘要: 知识图谱是由各种知识或数据单元经过抽取等处理而组成的一种结构化知识库,用于描述和表示实体、概念、事实和关系等信息。自然语言处理技术的限制和各种知识或信息单元文本本身的噪声都会使信息抽取的准确性受到一定程度的影响。现有的知识图谱补全方法通常只考虑单一结构信息或者文本语义信息,忽略了整个知识图谱中同时存在的结构信息与文本语义信息。针对此问题,提出一种基于语言模型增强嵌入与对比学习的知识图谱补全(KGC)模型。将输入的实体和关系通过预训练语言模型获取实体和关系的文本语义信息,利用翻译模型的距离打分函数捕获知识图谱中的结构信息,使用2种用于对比学习的负采样方法融合对比学习来训练模型以提高模型对正负样本的表征能力。实验结果表明,与基于来自Transformer的双向编码器表示的知识图谱补全(KG-BERT)模型相比,在WN18RR和FB15K-237数据集上该模型链接预测的排名小于等于10的三元组的平均占比(Hits@10)分别提升了31%和23%,明显优于对比模型。

关键词: 知识图谱补全, 知识图谱, 对比学习, 预训练语言模型, 链接预测

Abstract: A knowledge graph is a structured knowledge base comprising various types of knowledge or data units obtained through extraction and other processes. It is used to describe and represent information, such as entities, concepts, facts, and relationships. The limitations of Natural Language Processing(NLP) technology and the presence of noise in the texts of various knowledge or information units affect the accuracy of information extraction. Existing Knowledge Graph Completion(KGC) methods typically account for only single structural information or text semantic information, whereas the structural and text semantic information in the entire knowledge graph is disregarded. Hence, a KGC model based on contrastive learning and language model-enhanced embedding is proposed. The input entities and relationships are obtained using a pretrained language model to obtain the textual semantic information of the entities and relationships. The distance scoring function of the translation model is used to capture the structured information in the knowledge graph. Two negative sampling methods for contrastive learning are used to fuse contrastive learning to train the model to improve its ability to represent positive and negative samples. Experimental results show that compared with the Bidirectional Encoder Representations from Transformers for Knowledge Graph completion(KG-BERT) model, this model improves the average proportion of triple with ranking less than or equal to 10(Hits@10) indicator by 31% and 23% on the WN18RR and FB15K-237 datasets, respectively, thus demonstrating its superiority over other similar models.

Key words: Knowledge Graph Completion(KGC), knowledge graph, contrastive learning, pretrained language model, link prediction

中图分类号: