作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (9): 89-98. doi: 10.19678/j.issn.1000-3428.0065447

• 人工智能与模式识别 • 上一篇    下一篇

基于图表示学习的领域知识图谱推理技术研究

隋国华1, 李陶然2, 刘昊2, 陈林1, 汪卫2   

  1. 1. 中国石油化工股份有限公司 胜利油田分公司物探研究院, 山东 东营 257022
    2. 复旦大学 计算机科学技术学院, 上海 200438
  • 收稿日期:2022-08-05 出版日期:2023-09-15 发布日期:2023-09-14
  • 作者简介:

    隋国华(1969—),男,高级工程师、博士,主研方向为石油大数据开发及应用

    李陶然,硕士

    刘昊,博士研究生

    陈林,高级工程师

    汪卫,教授、博士

  • 基金资助:
    国家重点研发计划(2018YFB1403200)

Research on Domain Knowledge Graph Inference Technology Based on Graph Representation Learning

Guohua SUI1, Taoran LI2, Hao LIU2, Lin CHEN1, Wei WANG2   

  1. 1. Geophysical Prospecting Research Institute of Shengli Oilfield, China Petroleum and Chemical Corporation, Dongying 257022, Shandong, China
    2. School of Computer Science, Fudan University, Shanghai 200438, China
  • Received:2022-08-05 Online:2023-09-15 Published:2023-09-14

摘要:

现有领域知识图谱推理模型多数是由基于百科类通用知识图谱的推理模型迁移而来,但是领域知识图谱的异构性并未得到妥善处理。同时,现有研究将关系预测与三元组分类视作2个独立的任务而忽视了两者之间的关联,且领域知识在领域模型的建立过程中也未得到充分的利用。针对上述问题,建立基于翻译距离的改进推理模型TransSep,为异构的实体类型分配不同的特征空间。提出一种联合训练的策略,使得关系预测与三元组分类2个任务互相指导对方的负采样过程,并交替地学习实体的嵌入特征,从而提升2个任务的训练效果。以医疗领域知识图谱为例,将领域知识通过元路径的思想引入TransSep模型中,增强模型的表达能力。在由复旦大学构建的精准医学知识图谱上进行实验,结果表明,相比TransE、DistMult、TriModel等模型,TransSep模型在关系预测任务中MR分数至少提高17.4%,三元组分类任务中的F1值提高至0.928 6。

关键词: 领域知识图谱, 知识推理, 图表示学习, 图神经网络, 元路径

Abstract:

Most existing inference models of domain knowledge graph are migrated from inference models of general encyclopedic knowledge graphs, without properly handling the heterogeneity of domain knowledge graph. Existing research regards the relationship prediction and triad classification as two independent tasks and ignores the relationship between them, whereby domain knowledge is not fully utilized in the process of model building. To address the above issues, an improved inference model TransSep which is based on translation distance, is established to allocate different feature spaces to heterogeneous entity types. A joint training strategy is proposed to enable relationship prediction and triplet classification tasks, such that prediction and classification are guided by the negative sampling process of each other, and the embedding features of entities are alternately learned, thereby improving the training effectiveness of both tasks. Taking the knowledge graph of the medical field as an example, the idea of domain knowledge is introduced into the TransSep model through meta-path, to enhance the expression ability of the model. Experiments are carried out on the knowledge graph of precision medicine constructed by Fudan University. The results show that compared with TransE, DistMult, TriModel and other models, TransSep model improves MR score by at least 17.4% in relationship prediction tasks, and the F1 score in triple group classification tasks increased to 0.928 6.

Key words: domain knowledge graph, knowledge inference, graph representation learning, graph neural network, meta-path