作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (5): 48-55. doi: 10.19678/j.issn.1000-3428.0064148

• 人工智能与模式识别 • 上一篇    下一篇

基于语义和结构置信度的知识图谱质量校验方法

叶琪1, 张一乾1, 阮彤1, 杜渂2   

  1. 1. 华东理工大学 信息科学与工程学院, 上海 200237;
    2. 迪爱斯信息技术股份有限公司, 上海 200032
  • 收稿日期:2022-03-10 修回日期:2022-06-08 发布日期:2022-08-19
  • 作者简介:叶琪(1976-),女,讲师、博士,主研方向为知识图谱;张一乾,硕士研究生;阮彤,教授、博士;杜渂,高级工程师、硕士。
  • 基金资助:
    国家重点研发计划(2021YFC2701800,2021YFC2701801)。

Quality Verification Method for Knowledge Graph Based on Semantic and Structural Trustworthiness

YE Qi1, ZHANG Yiqian1, RUAN Tong1, DU Wen2   

  1. 1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
    2. DS Information Technology Co., Ltd., Shanghai 200032, China
  • Received:2022-03-10 Revised:2022-06-08 Published:2022-08-19

摘要: 知识图谱因其较强的表达能力和可解释性而被广泛应用于问答系统、信息检索等人工智能任务中,然而,在实际应用场景中大量使用自动化知识图谱构建技术会不可避免地引入噪声和冲突,从而对知识图谱下游应用的性能产生严重影响。为从知识图谱中检测出潜在的噪声、保存真实可信的三元组并为下游应用任务提供高质量的知识,提出一种基于语义与结构双重置信度的三元组评估模型。该模型由语义真实性评估器与结构真实性评估器构成,前者通过特定规则将三元组转换为句子序列,基于双向编码器表示变换模型度量语义真实性,后者通过表示学习模型获取实体及关系的向量表示,在知识表示、路径特征两个层面上度量结构真实性。在4个真实图谱数据集上的实验结果表明,所提模型的准确率、精确率、召回率、F1值等评估指标相较TransE-RFC、TransE-KNC、TransE-XGB等模型提升3%~4%,其能够有效检测带噪声图谱数据集中的噪声错误同时最大程度地保留真实可信的知识。

关键词: 知识图谱, 质量校验, 三元组置信度评估, 语义真实性, 结构置信度

Abstract: Knowledge graphs are widely used in artificial intelligence tasks such as question-answering systems and information retrieval owing to their strong expressive ability and interpretability.The extensive utilization of automated knowledge graph construction technology in practical scenarios introduces noise and conflict and has a serious impact on the performance of downstream applications of knowledge graphs.A triplet evaluation model is proposed based on semantic and structural double trustworthiness to detect potential noise from knowledge graph,preserve the authentic and credible triples,and provide high-quality knowledge for downstream application tasks.The model comprises semantic and structural authenticity evaluators.The former converts triples into sentence sequences through specific rules and measures semantic authenticity based on the Bidirectional Encoder Representations from Transformers(BERT) model.The latter measures structural authenticity at two levels,namely,knowledge representation and path feature,by vector representation of entities and relationships obtained from the learning model. Results from analysis of four real graph datasets indicate that the accuracy,precision,recall,F1 value,and other evaluation indicators of the proposed model are 3% to 4% higher than those of TransE-Random Forest Classifier(TransE-RFC),TransE-K Nearest Neighbor Classifier (TransE-KNC),TransE-eXtreme Gradient Boosting(TransE-XGB),and other models. The proposed model can effectively detect noise errors in graph datasets while preserving authentic and credible knowledge to the maximum extent.

Key words: knowledge graph, quality verification, triple trustworthiness evaluation, semantic authenticity, structural trustworthiness

中图分类号: