作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (3): 95-104. doi: 10.19678/j.issn.1000-3428.0064167

• 人工智能与模式识别 • 上一篇    下一篇

融合实体邻域信息的知识图谱嵌入负采样方法

翟社平1,2, 张宇航1, 柏晓夏1   

  1. 1. 西安邮电大学 计算机学院, 西安 710121;
    2. 陕西省网络数据分析与智能处理重点实验室, 西安 710121
  • 收稿日期:2022-03-14 修回日期:2022-04-21 发布日期:2022-05-04
  • 作者简介:翟社平(1971—),男,教授、博士,主研方向为自然语言处理、机器学习;张宇航、柏晓夏,硕士研究生。
  • 基金资助:
    国家自然科学基金(61373116);工业和信息化部通信软科学项目(2018R26);陕西省重点研发计划项目(2022GY-038);陕西省大学生创新创业训练计划项目(S202111664077);西安邮电大学研究生创新基金(CXJJLY202027)。

Knowledge Graph Embedding Negative Sampling Method Fused with Entity Neighborhood Information

ZHAI Sheping1,2, ZHANG Yuhang1, BAI Xiaoxia1   

  1. 1. School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an 710121, China;
    2. Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an 710121, China
  • Received:2022-03-14 Revised:2022-04-21 Published:2022-05-04

摘要: 知识图谱嵌入的主要任务是将实体与关系嵌入低维、连续的向量空间。在模型训练过程中,必须同时提供正负三元组。已有的负采样方法多使用均匀随机采样方法构造负样本,通过这种方式获得的负样本对于模型的训练贡献很小。基于生成对抗网络,生成器能够采样更多可信的负三元组,增强嵌入模型性能。然而,离散数据在使用遗传算法时存在梯度消失的问题。针对以上问题,提出一种融合实体邻域信息的知识图谱嵌入负采样方法。该方法基于生成对抗网络的框架,通过图卷积神经网络聚合实体在不同关系路径上的邻域信息,用以辅助生成器产生高质量的负样本,提高鉴别器的性能。同时,在鉴别器部分引入Wasserstein距离代替传统的散度,解决梯度消失问题,加速模型收敛。在链接预测任务和三元组分类任务上对所提方法的有效性进行验证,结果表明,该方法在链接预测任务中MR、MRR、Hits@10较基线模型分别平均提升4.18、9.19、10.18个百分点,在三元组分类任务中准确率平均提升4.50个百分点,充分证明实体邻域信息的融入能够进一步提升负样本质量,显著提升模型性能。

关键词: 知识图谱嵌入, 生成对抗网络, 邻域信息, 图卷积神经网络, Wasserstein距离

Abstract: Knowledge Graph Embedding(KGE) embeds entities and relations into low-dimensional and continuous vector space.During model training, both positive and negative triples must be provided.Most of the existing negative sampling methods use uniform random sampling to construct negative samples, which have little contribution toward the training of the model.Inspired by Generative Adversarial Network(GAN), the generator can sample more plausible negative triples, which enhances the embedding model performance.However, discrete data exhibit vanishing gradients when using genetic algorithms.Therefore, this paper proposes a KGE negative sampling method fused with entity neighborhood information and uses the graph CNN to aggregate the neighborhood information of entities on different relation paths to generate high-quality negative samples and improve the performance of the discriminator.The Wasserstein distance is introduced to replace the traditional divergence to solve the gradient disappearance problem and accelerate the model convergence.Furthermore, the proposed method is evaluated on the link prediction task and triplet classification task.The results show that MR, MRR, and Hits@10 obtained by the proposed method are better compared to other baseline models in the link prediction task, with an average improvement of 4.18, 9.19, and 10.18 percentage points, respectively.The accuracy rate in the triplet classification task increased by 4.50 percentage points on average, thereby confirming that the integration of entity neighborhood information can improve the quality of negative samples and the model performance.

Key words: Knowledge Graph Embedding(KGE), Generative Adversarial Network(GAN), neighborhood information, graph Convolution Neural Network(CNN), Wasserstein distance

中图分类号: