Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2006, Vol. 32 ›› Issue (20): 38-39,7. doi: 10.3969/j.issn.1000-3428.2006.20.014

• Degree Paper • Previous Articles     Next Articles

Efficient Clustering Algorithm Used for Web Search

LI Xinye, YUAN Jinsha   

  1. (Dept. of Electronic and Communication Engineering, North China Electric Power Univ., Baoding 071003)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-20 Published:2006-10-20

一种用于Web搜索的高效聚类算法

李新叶,苑津莎   

  1. (华北电力大学电子与通信工程系,保定 071003)

Abstract: A user access pattern clustering algorithm is researched according to search engine query log. It is explained that Jaccard coefficient and weighted similarity computation are not suitable for user access pattern clustering. A kind of improved Hamming distance computation formula is put forward; the clustering algorithm that uses Hamming distance to measure the similarity is given. After analyzing the algorithm, the result is concluded that this algorithm based on bipartite graph and improved Hamming distance computation formula is exact and efficient.

Key words: Clustering, Hamming distance, Search engine

摘要: 根据搜索引擎的用户查询日志库信息对用户访问模式聚类算法进行了研究,说明了用雅可比系数及加权相似性度量公式实现用户访问模式聚类的不足,提出了一种改进的Hamming距离公式,运用距离测度法实现用户访问模式聚类,给出了聚类算法。对算法的分析表明,基于偶图和改进Hamming距离公式的算法是准确和高效的。

关键词: 聚类, Hamming距离, 搜索引擎

CLC Number: