作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (21): 40-41,4. doi: 10.3969/j.issn.1000-3428.2008.21.015

• 软件技术与数据库 • 上一篇    下一篇

大规模中文搜索日志中查询重复性分析

窦志成1,袁晓洁1,何松柏2   

  1. (1. 南开大学信息技术科学学院,天津 300071;2. 军事交通学院汽车指挥系,天津 300161)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-11-05 发布日期:2008-11-05

Analysis of Query Repetition in Large-scale Chinese Search Log

DOU Zhi-cheng1, YUAN Xiao-jie1, HE Song-bai2   

  1. (1. College of Information Technical Science, Nankai University, Tianjin 300071; 2. Automobile Transport Command Department, Academy of Military Transport, Tianjin 300161)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-11-05 Published:2008-11-05

摘要: 分析大规模中文搜索日志中的查询重复性,通过对查询重复率和用户个体查询重复率等数据的统计发现:查询串的查询频率、文档的点击频率及用户查询频率均符合Zipf分布,查询重复率较高。查询历史越长,查询重复率越高。高查询频率用户的查询重复率较高。以上数据为中文搜索引擎的改进提供了有力的依据。

关键词: 搜索引擎, 日志分析, 重复性, Zipf分布

Abstract: This paper analyzes query repetition in a large-scale Chinese search engine log. It provides detailed statistics about query repetition and individual query repetition. Key conclusions include: query frequency, document click frequency and user frequency follow Zipf distributions. Queries are with high repetition ratios. Query repetition ratio increases when users’ search histories become rich. The users who search more frequently have higher query repetition ratios. These conclusions are useful for improving search performance of Chinese search engines.

Key words: search engine, log analysis, repetition, Zipf distribution

中图分类号: