摘要: 目前多数真值发现算法建立在真值唯一的基础上,无法处理多真值的情况。为此,针对冲突Deep Web数据的多真值发现问题,借鉴HITS算法思想,定义视图权威度与视图描述可信度,两者相互影响。在此基础上,定义视图链接关系图,提出多真值迭代发现算法MTF。当算法收敛时,权威度最大的视图就是真值。在Book-Authors数据集上进行实验,结果表明,与基准算法VOTE相比,MTF算法的精确度大幅提高。
关键词:
Web数据源,
数据模型,
可信度,
视图,
真值发现
Abstract: Based on the assumption of only one truth value,most of current truth value discovery algorithm cannot process the multiple truth value condition.In order to solve this problem,aiming at the multiple true value discovery problem in conflicting Deep Web data,this paper defines authority of view and credibility of description,inspired by the idea of Hypertext-Induced Topic Search(HITS) algorithm.The authority of view and the credibility of description depend on each other.On this basis,it constructs link graph of views,and proposes an iterative multiple truth value discovery algorithm,named MTF.When the algorithm converges,the view with maximum authority is the truth value.Experimental results on Book-Authors datesets show that the accuracy of MTF can be improved greatly than standard VOTE algorithm.
Key words:
Web data source,
data model,
credibility,
view,
truth value discovery
中图分类号:
王继奎,李少波. 基于HITS的冲突Deep Web数据多真值发现算法[J]. 计算机工程.
WANG Jikui,LI Shaobo. Multiple Truth Value Discovery Algorithm for Conflicting Deep Web Data Based on HITS[J]. Computer Engineering.