摘要: 不确定数据库中的概率阈值top-k查询是计算元组排在前k位的概率和,返回概率和不小于p的元组,但现有的查询语义没有将x-tuple内的元组进行整体处理。针对该情况,定义一种新的查询语义——概率阈值x-top-k查询,并给出查询处理算法。在该查询语义下采用动态规划方法求取x-tuple内每个元组排在前k位的概率和,对其进行聚集后做概率阈值top-k查询,并利用观察法、最大上限值等剪枝方法进行优化。实验结果表明,该算法平均扫描全体数据集中60%的数据即可返回正确结果集,证明其查询处理效率较高。
关键词:
不确定数据库,
概率阈值top-k查询,
x-元组,
动态规划算法,
聚集
Abstract: Probabilistic threshold top-k query calculation sum of the probability of the tuple ranked top-k and return the tuples whose sum of the probability are at least p. But top-k query does not take x-tuple as a whole, thus a new top-k query semantic probabilistic threshold x-top-k query is defined and an algorithm is given to process it, which uses dynamic method to acquire sum of the probability of the tuple, then process aggregate probabilities with top-k query. It uses several pruning methods like the upper bound method and so on to optimize the algorithm. Experimental result shows that the algorithm return the answer set for scanning about 60% of data set, and it demonstrates that the algorithm is efficient.
Key words:
uncertain database,
probabilistic threshold top-k query,
x-tuple,
dynamic programming algorithm,
aggregation
中图分类号:
黄冬梅, 舒博, 王建, 熊中敏. 基于x-tuple的概率阈值top-k查询算法[J]. 计算机工程, 2013, 39(4): 44-47.
HUANG Dong-Mei, SHU Bo, WANG Jian, XIONG Zhong-Min. Probabilistic Threshold top-k Query Algorithm Based on x-tuple[J]. Computer Engineering, 2013, 39(4): 44-47.