作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 339-347. doi: 10.19678/j.issn.1000-3428.0066820

• 开发研究与工程应用 • 上一篇    下一篇

基于多邻域感知的石油数据资产图谱实体对齐

王志宝1,2,*(), 江树涛1, 李菲3, 高俊涛1, 马强3, 杨彬1   

  1. 1. 东北石油大学计算机与信息技术学院, 黑龙江 大庆 163318
    2. 东北石油大学环渤海能源研究院, 河北 秦皇岛 066004
    3. 黑龙江八一农垦大学信息与电气工程学院, 黑龙江 大庆 163319
  • 收稿日期:2023-01-28 出版日期:2024-01-15 发布日期:2024-01-11
  • 通讯作者: 王志宝
  • 基金资助:
    黑龙江省高等教育教学改革项目(SJGY20200125); 中石化科技攻关项目(33550000-20-ZC0613-0098)

Entity Alignment of Petroleum Data Assets Graph Based on Multi-Neighborhood Awareness

Zhibao WANG1,2,*(), Shutao JIANG1, Fei LI3, Juntao GAO1, Qiang MA3, Bin YANG1   

  1. 1. School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, Heilongjiang, China
    2. Bohai-Rim Energy Research Institute, Northeast Petroleum University, Qinhuangdao 066004, Hebei, China
    3. College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, Heilongjiang, China
  • Received:2023-01-28 Online:2024-01-15 Published:2024-01-11
  • Contact: Zhibao WANG

摘要:

实体对齐在自动融合多源异构的石油领域数据资产知识图谱过程中起着至关重要的作用。目前主流基于图神经网络的实体对齐模型多关注实体和图结构的信息,忽略了实体之间的关系、属性与属性值等多邻域的语义信息,在命名规则差异性大、行业特殊、语义实体多的石油领域数据资产知识图谱融合过程中性能一般。提出一种基于图注意力网络改进的多邻域感知网络(MNAN)模型,用于实体对齐。使用基于BERT的多语言预训练模型得到实体及多邻域的初始特征,通过带有Highway Networks的图卷积神经网络聚合邻域实体与图结构特征,利用多邻域感知和实体增强注意力网络聚合实体的多邻域特征,使用最小化基于边际的损失函数训练模型。在石油领域数据资产知识图谱数据集中的2个知识图谱上进行实体对齐实验,实验结果表明,MNAN模型优于所有对比的基于图神经网络实体对齐模型,Hits@1值达86.7%,优于表现最好的对比模型约2.3个百分点。

关键词: 实体对齐, 多邻域感知, 图注意力网络, 石油领域数据资产, 知识图谱

Abstract:

Entity alignment plays a crucial role in the automatic fusion of multi-source heterogeneous petroleum data asset knowledge graphs. Currently, mainstream entity alignment models based on graph neural networks mainly focus on the information of entities and graph structures, however, they ignore the semantic information of multi-neighborhoods, such as the relationships between entities, attributes, and attribute values. Their performance in the fusion process of data asset knowledge graphs in the petroleum field with significant differences in naming rules and multiple industry-specific semantic entities is average. This study proposes an improved Multi-Neighborhood Awareness Network(MNAN) model based on graph attention network for entity alignment. By using a BERT-based multi-lingual pre-training model to obtain initial features of entities and multi-neighborhoods, a graph convolutional neural network with highway networks is used to aggregate neighborhood entities and graph structural features. Multi-neighborhood perception and entity enhancement are used to aggregate multi-neighborhood features of entities in the attention network. Finally, the model is trained using a minimum marginal-based loss function. An entity alignment experiment is conducted on two knowledge graphs in the data asset knowledge graph dataset of the petroleum field. The experimental results show that the MNAN model outperforms all compared entity alignment models based on the graph neural network, Hits@1 value reaches 86.7%, which is approximately 2.3 percentange points better than that of the best performing comparative model.

Key words: entity alignment, multi-neighborhood awareness, graph attention network, data assets in the petroleum field, Knowledge Graph(KG)