作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (7): 49-51. doi: 10.3969/j.issn.1000-3428.2009.07.016

• 软件技术与数据库 • 上一篇    下一篇

Web日志预处理中优化的会话识别方法

方元康1,2,胡学钢1,夏启寿2   

  1. (1. 合肥工业大学计算机与信息学院,合肥 230009;2. 池州学院计算机中心,池州 247000 )
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-04-05 发布日期:2009-04-05

Improved Method for Session Identification in Web Log Preprocessing

FANG Yuan-kang1,2, HU Xue-gang1, XIA Qi-shou2   

  1. (1. Computer & Information College, Hefei University of Technology, Hefei 230009; 2. Center of Computer, Chizhou College, Chizhou 247000)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-04-05 Published:2009-04-05

摘要: 针对Web日志数据预处理中会话识别这一重要环节,提出一种优化的会话识别算法。在用户识别后,通过过滤框架页面大幅度减少实验产生的有效页面数,为每个页面设置访问时间阈值,并根据页面重要程度对该阈值进行调整,页面的重要性由页面内容及站点结构确定。实验数据显示,与对所有页面使用单一的先验阈值进行会话识别的方法相比较,该方法得到了真实性更强的会话集。

关键词: Web挖掘, 数据预处理, 阈值, Frame 页面, 会话识别

Abstract: Session identification is an important step in data preprocessing of Web log mining. This paper proposes an improved session identification algorithm. After identifying users, effective Web pages in experiment are reduced greatly by filtering frame pages, and the access time threshold is adjusted by the Web contents and site’s structure on this condition. Compared with the traditional method that defines a uniform threshold for all Web pages experimentally, the approach can decide the access time threshold more accurately. Experiments proves that the algorithm enhances the quality of transaction session.

Key words: Web mining, data preprocessing, threshold, Frame page, session identification

中图分类号: