计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于HITS的冲突Deep Web数据多真值发现算法

王继奎1,李少波2   

  1. (1.兰州财经大学 电子商务综合重点实验室,兰州 730000; 2.贵州大学 现代制造技术教育部重点实验室,贵阳 550003)
  • 收稿日期:2015-08-17 出版日期:2016-09-15 发布日期:2016-09-15
  • 作者简介:王继奎(1978-),男,副教授、博士,主研方向为数据处理、数据集成;李少波,教授、博士生导师。
  • 基金项目:
    国家社科基金资助项目“基于大数据整合的空气质量测度方法研究”(14GSD95);全国统计科研基金资助重点项目“海量异源异物数据的采集、存储和分析方案研究”(2013LZ44);陇原创新人才扶持计划基金资助项目(14GSD95);甘肃省财政厅高校基本科研业务费基金资助项目(GZ14007,GZ14023)。

Multiple Truth Value Discovery Algorithm for Conflicting Deep Web Data Based on HITS

WANG Jikui  1,LI Shaobo  2   

  1. (1.Comprehensive Key Laboratory of Electronic Commerce,Lanzhou University of Finance and Economics,Lanzhou 730000,China;2.Key Laboratory of Advanced Manufacturing Technology,Ministry of Education,Guizhou University,Guiyang 550003,China)
  • Received:2015-08-17 Online:2016-09-15 Published:2016-09-15

摘要: 目前多数真值发现算法建立在真值唯一的基础上,无法处理多真值的情况。为此,针对冲突Deep Web数据的多真值发现问题,借鉴HITS算法思想,定义视图权威度与视图描述可信度,两者相互影响。在此基础上,定义视图链接关系图,提出多真值迭代发现算法MTF。当算法收敛时,权威度最大的视图就是真值。在Book-Authors数据集上进行实验,结果表明,与基准算法VOTE相比,MTF算法的精确度大幅提高。

关键词: Web数据源, 数据模型, 可信度, 视图, 真值发现

Abstract: Based on the assumption of only one truth value,most of current truth value discovery algorithm cannot process the multiple truth value condition.In order to solve this problem,aiming at the multiple true value discovery problem in conflicting Deep Web data,this paper defines authority of view and credibility of description,inspired by the idea of Hypertext-Induced Topic Search(HITS) algorithm.The authority of view and the credibility of description depend on each other.On this basis,it constructs link graph of views,and proposes an iterative multiple truth value discovery algorithm,named MTF.When the algorithm converges,the view with maximum authority is the truth value.Experimental results on Book-Authors datesets show that the accuracy of MTF can be improved greatly than standard VOTE algorithm.

Key words: Web data source, data model, credibility, view, truth value discovery

中图分类号: