计算机工程 ›› 2019, Vol. 45 ›› Issue (4): 181-188.doi: 10.19678/j.issn.1000-3428.0049737

• 人工智能及识别技术 • 上一篇    下一篇

基于一致性支持度的实体top-k扩展算法

孙伟娟,王宁   

  1. 北京交通大学 计算机与信息技术学院,北京 100044
  • 收稿日期:2017-12-19 出版日期:2019-04-15 发布日期:2019-04-15
  • 作者简介:孙伟娟(1993—),女,硕士研究生,主研方向为Web数据集成、数据挖掘;王宁,教授、博士。
  • 基金项目:

    国家自然科学基金(61370060);中央高校基本科研业务费专项资金(2017YJS065)。

Top-k Entity Augmentation Algorithm Based on Consistent Supporting Degree

SUN Weijuan,WANG Ning   

  1. School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
  • Received:2017-12-19 Online:2019-04-15 Published:2019-04-15

摘要:

现有的实体扩展技术返回单一结果,且只适用于扩展单个属性列,对于多属性列的实体扩展易产生实体不一致的问题。为此,提出2种实体top-k扩展算法。根据答案表之间的一致性匹配度,在众多网络表格中找到k个具有最高一致性支持度的答案表集合,以补充待扩展实体的缺失信息。实验结果表明,2种算法能够较好地实现实体的top-k扩展,并保持扩展结果的高一致性和高准确度。基于一致性匹配度的实体top-k扩展算法具有较高的多样性,而基于分支限界的实体top-k扩展算法在可信度方面有更好的表现。

关键词: 实体top-k扩展, 网络表格, 分支限定, 一致性匹配度, 数据集成

Abstract:

Existing entity augmentation techniques can only return a single result,and they are only applicable for augmenting entities with a single attribute.As for entities having multiple attributes,they will return inconsistent results.Two kinds of top-k entity augmentation algorithm are proposed to settle the problems.According to the consistent matching degree,k answer table sets with the highest consistency supporting degree were found in many Web tables to settle entity inconsistent problem.Experimental results show that these two algorithms have implemented top-k entity augmentation with high result consistency and accuracy.The diversity of the algorithm based on consistent matching degree is higher,while the algorithm based on branch-and-bound is more effective in getting reliable entity augmentation results.

Key words: top-k entity augmentation, Web table, branch-and-bound, consistent matching degree, data integration

中图分类号: