计算机工程 ›› 2009, Vol. 35 ›› Issue (22): 44-46.doi: 10.3969/j.issn.1000-3428.2009.22.015

• 软件技术与数据库 • 上一篇    下一篇

Web日志挖掘中的数据预处理技术

李 燕1,2,冯博琴1,鲁晓锋2   

  1. (1. 西安交通大学电子与信息工程学院,西安 710049;2. 西安理工大学计算机科学与工程学院,西安 710048)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-11-20 发布日期:2009-11-20

Data Preprocessing Technique in Web Log Mining

LI Yan1,2, FENG Bo-qin1, LU Xiao-feng2   

  1. (1. School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an 710049; 2. School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-11-20 Published:2009-11-20

摘要:

数据预处理是Web日志挖掘中的重要步骤,一般分为数据清理、用户识别、会话识别和路径补充。为消除代理服务器、防火墙和本地缓存对Web日志带来的影响,采用基于引用的分析方法完成用户会话识别和路径补充。实验结果表明,在Web访问日志中的记录引用信息较完整的情况下,该方法可以高效地获得用户的访问路径。

关键词: Web日志挖掘, 数据预处理, 用户会话识别, 路径补充

Abstract: Data preprocessing is the important step in Web log mining. It consists of four sub-steps, i.e. data cleaning, user identification, session identification and path completion. The referer-based method is adopted for user session identification and path completion, in order to avoid the problems introduced by using proxy servers, firewall, local caching, and so on. Experimental results reveal that the technique can obtain the user access path efficiently if accurate referer information is available in Web access log.

Key words: Web log mining, data preprocessing, user-session identification, path completion

中图分类号: