作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (20): 38-39,7. doi: 10.3969/j.issn.1000-3428.2006.20.014

• 博士论文 • 上一篇    下一篇

一种用于Web搜索的高效聚类算法

李新叶,苑津莎   

  1. (华北电力大学电子与通信工程系,保定 071003)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-10-20 发布日期:2006-10-20

Efficient Clustering Algorithm Used for Web Search

LI Xinye, YUAN Jinsha   

  1. (Dept. of Electronic and Communication Engineering, North China Electric Power Univ., Baoding 071003)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-20 Published:2006-10-20

摘要: 根据搜索引擎的用户查询日志库信息对用户访问模式聚类算法进行了研究,说明了用雅可比系数及加权相似性度量公式实现用户访问模式聚类的不足,提出了一种改进的Hamming距离公式,运用距离测度法实现用户访问模式聚类,给出了聚类算法。对算法的分析表明,基于偶图和改进Hamming距离公式的算法是准确和高效的。

关键词: 聚类, Hamming距离, 搜索引擎

Abstract: A user access pattern clustering algorithm is researched according to search engine query log. It is explained that Jaccard coefficient and weighted similarity computation are not suitable for user access pattern clustering. A kind of improved Hamming distance computation formula is put forward; the clustering algorithm that uses Hamming distance to measure the similarity is given. After analyzing the algorithm, the result is concluded that this algorithm based on bipartite graph and improved Hamming distance computation formula is exact and efficient.

Key words: Clustering, Hamming distance, Search engine

中图分类号: