作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (4): 150-159. doi: 10.19678/j.issn.1000-3428.0067814

• 人工智能与模式识别 • 上一篇    下一篇

基于动态图注意力与标签传播的实体对齐

莫少聪, 陈庆锋, 谢泽, 刘春雨, 邱俊铼   

  1. 广西大学计算机与电子信息学院, 广西 南宁 530004
  • 收稿日期:2023-06-07 修回日期:2023-07-21 发布日期:2024-04-22
  • 通讯作者: 陈庆锋,E-mail:20090016@gxu.edu.cn E-mail:20090016@gxu.edu.cn
  • 基金资助:
    国家自然科学基金(61963004,61862006);广西自然科学基金(2017GXNSFDA198033)。

Entity Alignment Based on Dynamic Graph Attention and Label Propagation

MO Shaocong, CHEN Qingfeng, XIE Ze, LIU Chunyu, QIU Junlai   

  1. School of Computer, Electronics and Information, Guangxi University, Nanning 530004, Guangxi, China
  • Received:2023-06-07 Revised:2023-07-21 Published:2024-04-22

摘要: 实体对齐是多源数据库融合的有效方法,旨在找出多源知识图谱中的共指实体。近年来,图卷积网络(GCN)已成为实体对齐表示学习的新范式,然而,不同组织构建知识图谱的目标及规则存在巨大差异,要求实体对齐模型能够准确发掘知识图谱之间的长尾实体特征,并且现有的GCN实体对齐模型过于注重关系三元组的结构表示学习,忽略了属性三元组丰富的语义信息。为此,提出一种实体对齐模型,引入动态图注意力网络聚合属性结构三元组表示,降低无关属性结构对实体表示的影响。同时,为缓解知识图谱的关系异构问题,引入多维标签传播对实体邻接矩阵的不同维度进行压缩,将实体特征根据压缩后的知识图谱邻接关系进行传播以获得关系结构表示,最后通过线性规划算法对实体表示相似度矩阵进行迭代以得到最终的对齐结果。在公开数据集EN-FR-15K、EN-ZH-15K以及中文医学数据集MED-BBK-9K上进行实验,结果表明,该模型的Hits@1分别为0.942、0.926、0.427,Hits@10分别为0.963、0.952、0.604,MRR分别为0.949、0.939、0.551,消融实验结果也验证了模型中各模块的有效性。

关键词: 数据库融合, 图卷积网络, 实体对齐, 标签传播, 线性规划

Abstract: Entity alignment is an effective approach for multi-source database fusion with the aim of identifying co-referring entities in multi-source knowledge graphs. Recently, Graph Convolutional Network (GCN) have emerged as a new paradigm for entity alignment representation learning. However, there are significant differences in the objectives and rules for constructing knowledge graphs in different organizations, which require entity alignment models to accurately explore the long-tail entity features among knowledge graphs. Moreover, existing GCN entity alignment models focus overly on the structural representation of relationship triplets and neglect the rich semantic information of the attribute triplets. Accordingly, an entity alignment model is proposed that introduces a dynamic graph attention network to aggregate the attribute structure triplet representations and reduce the impact of irrelevant attribute structures on the entity representations. Simultaneously, to alleviate the problem of heterogeneous relationships in knowledge graphs, multi-dimensional label propagation is introduced to compress the different dimensions of the entity adjacency matrix. The entity features are propagated along the compressed knowledge graph adjacency relationship to obtain a relationship structure representation. Finally, a linear programming algorithm is used to iterate the entity representation similarity matrix to obtain the final alignment result. Experiments are conducted on publicly available datasets EN-FR-15K, EN-ZH-15K, and the Chinese medical dataset MED-BBK-9K, and the results demonstrate that the Hits@1 of the model are 0.942, 0.926, and 0.427, the Hits@10 are 0.963, 0.952, and 0.604, and the Mean Reciprocal Rank (MRR) values are 0.949, 0.939, and 0.551, respectively. The ablation experimental results verify the effectiveness of each module in the model.

Key words: database fusion, Graph Convolutional Network(GCN), entity alignment, label propagation, linear programming

中图分类号: