作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (1): 43-45,4. doi: 10.3969/j.issn.1000-3428.2009.01.014

• 软件技术与数据库 • 上一篇    下一篇

搜索引擎查询日志的聚类

张玉连,李彦威,王 权,原福永   

  1. (燕山大学信息科学与工程学院,秦皇岛 066004 )
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-01-05 发布日期:2009-01-05

Clustering of Search Engine Query Log

ZHANG Yu-lian, LI Yan-wei, WANG Quan, YUAN Fu-yong   

  1. (College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-01-05 Published:2009-01-05

摘要: 随着搜索引擎技术和网络数据挖掘技术的发展,怎样从搜索引擎查询日志中找到有用的信息成为研究热点。该文在讨论Beeferman提出的算法及Chan对其改进的算法的优缺点后,提出一个基于用户网页兴趣度的改进算法。该算法能进一步减小噪声数据的影响,并通过模拟实验对3种不同的算法进行了对比。

关键词: 用户兴趣, 搜索引擎查询日志, 数据挖掘

Abstract: In recent years, with the search engine technology and the network data mining technology development, how to find the useful information from the search engine query log becomes an important research direction. This paper discusses the excellences and the disadvantages of the clustering algorithm proposed by Beeferman and the improved algorithm which is proposed by Chan. A new improved algorithm based on the user profile of the Webpage is proposed that can weaken the influence of the noises data. And the simulation experiment proves that the new algorithm is better than the Beeferman algorithm and the Chan algorithm.

Key words: user profile, search engine query log, data mining

中图分类号: