摘要:
介绍Web日志挖掘的预处理过程,其中包括数据清理、站点拓扑识别、用户识别、会话识别、页面过滤和路径补充。针对无引用域记录日志的路径补充问题,提出并实现一种基于网站拓扑图的路径补充算法。讨论一种综合多重评价因素的用户相似度计算方法,并将其应用于Web用户聚类操作。使用Davies-Bouldin指标衡量聚类的效果并给出实验结果。
关键词:
日志预处理,
路径补充,
用户相似度,
模糊聚类
Abstract:
The paper introduces the pre-processing procedure, which includes data cleaning, Website topology identification, user identification, session identification, page filtering and path completion. With respect to logs without reference record, a path completion algorithm based on Website topology is put forward and implemented. A multi-factor user similarity computing method is introduced and applies on Web user clustering. Davies-Bouldin index is used to evaluate to effectiveness of the experiment results.
Key words:
log pre-processing,
path complementing,
user similarity,
fuzzy clustering
中图分类号:
吴金桥, 曹奇英, 何夏燕, 庄怡雯. 基于多重评价因素的Web用户聚类方法[J]. 计算机工程, 2011, 37(10): 44-46.
TUN Jin-Qiao, CAO Ai-Yang, HE Jia-Yan, PENG Yi-Wen. Web User Clustering Method Based on Multiple Evaluating Factors[J]. Computer Engineering, 2011, 37(10): 44-46.