作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (10): 44-46. doi: 10.3969/j.issn.1000-3428.2011.10.014

• 软件技术与数据库 • 上一篇    下一篇

基于多重评价因素的Web用户聚类方法

吴金桥 1,曹奇英 1,何夏燕 2,庄怡雯 1   

  1. (1. 东华大学计算机科学与技术学院,上海 201600;2. 上海交通大学电子信息与电气工程学院,上海 200240)
  • 出版日期:2011-05-20 发布日期:2011-05-20
  • 作者简介:吴金桥(1987-),男,硕士,主研方向:数据挖掘;曹奇英,教授;何夏燕、庄怡雯,硕士
  • 基金资助:

    教育部科技基金资助重点项目(104086)

Web User Clustering Method Based on Multiple Evaluating Factors

WU Jin-qiao 1, CAO Qi-ying 1, HE Xia-yan 2, ZHUANG Yi-wen 1   

  1. (1. School of Computer Science and Technology, Donghua University, Shanghai 201600, China; 2. School of Electronic, Information and Electrical Engineering, Shanghai Jiaotong University, Shanghai 200240, China)
  • Online:2011-05-20 Published:2011-05-20

摘要:

介绍Web日志挖掘的预处理过程,其中包括数据清理、站点拓扑识别、用户识别、会话识别、页面过滤和路径补充。针对无引用域记录日志的路径补充问题,提出并实现一种基于网站拓扑图的路径补充算法。讨论一种综合多重评价因素的用户相似度计算方法,并将其应用于Web用户聚类操作。使用Davies-Bouldin指标衡量聚类的效果并给出实验结果。

关键词: 日志预处理, 路径补充, 用户相似度, 模糊聚类

Abstract:

The paper introduces the pre-processing procedure, which includes data cleaning, Website topology identification, user identification, session identification, page filtering and path completion. With respect to logs without reference record, a path completion algorithm based on Website topology is put forward and implemented. A multi-factor user similarity computing method is introduced and applies on Web user clustering. Davies-Bouldin index is used to evaluate to effectiveness of the experiment results.

Key words: log pre-processing, path complementing, user similarity, fuzzy clustering

中图分类号: