作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (3): 64-75. doi: 10.19678/j.issn.1000-3428.0068839

• 人工智能与模式识别 • 上一篇    下一篇

基于多侧面信息表征联合的实体相似性度量及对齐方法

朱红1, 王阔然1,*(), 朱彤2   

  1. 1. 中国矿业大学(北京)人工智能学院, 北京 100083
    2. 中国矿业大学(北京)档案馆, 北京 100083
  • 收稿日期:2023-11-14 出版日期:2025-03-15 发布日期:2025-03-17
  • 通讯作者: 王阔然
  • 基金资助:
    2022年度北京市档案局科研项目

Entity Similarity Metrics and Alignment Method Based on the Union of Multi-Side Information Representations

ZHU Hong1, WANG Kuoran1,*(), ZHU Tong2   

  1. 1. School of Artificial Intelligence, China University of Mining and Technology(Beijing), Beijing 100083, China
    2. Archives, China University of Mining and Technology(Beijing), Beijing 100083, China
  • Received:2023-11-14 Online:2025-03-15 Published:2025-03-17
  • Contact: WANG Kuoran

摘要:

实体对齐旨在发现不同知识图谱中相同对象的不同实例, 但图谱之间的异构性导致等价实例结构及表征不一致, 从而影响实体对齐准确性。提出一种实体主信息与多侧面信息表征相联合的异构图谱实体相似性度量方法, 并用于实体对齐任务。实体主信息包括实体名称及描述, 侧面信息包括实体属性、关系及关联实体描述等信息。针对图谱间等价实体结构异构带来的对齐干扰, 提出了一种结合实体多侧面信息语义表征的相似性度量方法UnMuSIR-SM&EA用于实体对齐。为提升信息同义词的表示一致性, 引入表示学习模型以获取实体各信息的语义表征, 为解决表示学习模型嵌入空间各向异性带来的同义词度量尺度不一致问题, 设计了一种基于实体主信息对比学习的微调方法, 优化实体信息的语义表征。实验结果表明, 该方法在结构差异较大的数据集DISZH-EN上的Hits@1达到了95.2%, 比基于侧面信息的模型BERT-INT高出了16.8百分点; 在DBP15K的DBP15KZH-EN、DBP15KJA-EN和DBP15KFR-EN数据子集上的Hits@1分别达到了95.7%、96.0%和98.9%;在DBP-WD数据集上的Hits@1达到了99.4%。所提模型在实体对齐任务上具有优异的效果。

关键词: 实体对齐, 知识图谱, 相似性度量, 对比学习, 预训练模型

Abstract:

Entity alignment aims to identify different instances of the same object in different knowledge graphs. However, the heterogeneity between graphs leads to inconsistent equivalent instance structures and representations, thereby affecting the accuracy of entity alignment. A heterogeneous graph entity similarity measurement method that combines the main entity information and multi-side information representation is proposed and applied to entity alignment tasks. The main entity information includes the entity name and description, while the peripheral information includes entity attributes, relationships, and related entity descriptions. A similarity metric method called UnMuSIR-SM&EA, which combines the semantic representations of the multi-side information of entities, is proposed for entity alignment to address the alignment interference caused by the heterogeneity of equivalent entity structures between graphs. A representation learning model is introduced to obtain the semantic representations of various types of entity information, to improve synonym representation consistency. To solve the problem of inconsistent synonym measurement scales caused by spatial anisotropy in the embeddings of representation learning models, a fine-tuning method is designed based on entity main information contrastive learning to optimize the semantic representation of entity information. Experimental results show that the proposed method performs exceptionally well on the DISZH-EN dataset with significant structural differences with Hits@1 reaching 95.2%, which is 16.8 percentage points higher than the BERT-INT model based on peripheral information. On the DBP15KZH-EN, DBP15KJA-EN, and DBP15KFR-EN subsets of DBP15K data Hits@1 is 95.7%, 96.0%, and 98.9%, respectively. On the DBP-WD dataset Hits@1 is 99.4%. The proposed model exhibites excellent performance in entity alignment tasks.

Key words: entity alignment, knowledge graphs, similarity metrics, comparative learning, pre-training model