Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2009, Vol. 35 ›› Issue (1): 43-45,4. doi: 10.3969/j.issn.1000-3428.2009.01.014

• Software Technology and Database • Previous Articles     Next Articles

Clustering of Search Engine Query Log

ZHANG Yu-lian, LI Yan-wei, WANG Quan, YUAN Fu-yong   

  1. (College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-01-05 Published:2009-01-05

搜索引擎查询日志的聚类

张玉连,李彦威,王 权,原福永   

  1. (燕山大学信息科学与工程学院,秦皇岛 066004 )

Abstract: In recent years, with the search engine technology and the network data mining technology development, how to find the useful information from the search engine query log becomes an important research direction. This paper discusses the excellences and the disadvantages of the clustering algorithm proposed by Beeferman and the improved algorithm which is proposed by Chan. A new improved algorithm based on the user profile of the Webpage is proposed that can weaken the influence of the noises data. And the simulation experiment proves that the new algorithm is better than the Beeferman algorithm and the Chan algorithm.

Key words: user profile, search engine query log, data mining

摘要: 随着搜索引擎技术和网络数据挖掘技术的发展,怎样从搜索引擎查询日志中找到有用的信息成为研究热点。该文在讨论Beeferman提出的算法及Chan对其改进的算法的优缺点后,提出一个基于用户网页兴趣度的改进算法。该算法能进一步减小噪声数据的影响,并通过模拟实验对3种不同的算法进行了对比。

关键词: 用户兴趣, 搜索引擎查询日志, 数据挖掘

CLC Number: