摘要: k最近邻搜索算法无法满足数据挖掘的分布性、实时性和可扩展性要求,针对该问题提出基于P2P的自适应分布式k最近邻搜索算法(P2PAKNNs)。阐述GHT*结构,定义高维数据相似度函数HDSF(X,Y),论述GHT*中的插入算法、范围查找算法和搜索算法。给出P2PAKNNs的实现过程,通过实验证明其正确性。
关键词:
k最近邻搜索算法,
度量空间,
相似性查询
Abstract: k-nearest Neighbor search algorithm(KNNs) can not satisfy the needs of distributing, real time performance and expansibility for data mining. Aiming at this problem, a P2P-based self-adaptive distributed KNNs(P2PAKNNs) is proposed. This paper expounds GHT* structure, and gives similarity measure function HDSF(X, Y). Insert algorithm, range find algorithm and search algorithm in GHT* are discussed. Implementation process of P2PAKNNs is given, and its correctness is validated by experiment.
Key words:
k-nearest Neighbor search algorithm(KNNs),
metric space,
similarity query
中图分类号:
余小高;余小鹏. 基于P2P的自适应分布式k最近邻搜索算法[J]. 计算机工程, 2009, 35(19): 49-52,5.
YU Xiao-gao; YU Xiao-peng. P2P-based Self-adaptive Distributed k-nearest Neighbor Search Algorithm[J]. Computer Engineering, 2009, 35(19): 49-52,5.