作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (14): 58-59. doi: 10.3969/j.issn.1000-3428.2009.14.020

• 软件技术与数据库 • 上一篇    下一篇

动态Web点击流中频繁访问序列的挖掘

张啸剑,邵 超,张亚东   

  1. (河南财经学院信息学院,郑州 450002)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-07-20 发布日期:2009-07-20

Mining of Frequent Traversal Sequences in Dynamic Web Clickstreams

ZHANG Xiao-jian, SHAO Chao, ZHANG Ya-dong   

  1. (College of Information, Henan University of Finance & Economics, Zhengzhou 450002)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-07-20 Published:2009-07-20

摘要: 基于False-Positive方法挖掘Web点击流中的频繁访问序列时通过相关比率ρ控制其内存消耗和挖掘精度,两者之间会因ρ产生冲突。针对该问题提出一种基于False-Negative方法和时间敏感滑动窗的算法FTS-Stream,该算法利用2个边界参数约束ρ,采用2个边界的加权调和平均数替代ρ。实验证明该算法相对于同类方法有较好的性能。

关键词: 频繁访问序列, 加权调和平均数, 调节因子

Abstract: When using false-positive approach to mine Frequent Traversal Sequence(FTS) in Web clickstreams, it utilizes relaxation ratio ρ to control memory consumption and precision of mining result. However, such approaches may lead to a conflict between precision and memory consumption because of using ρ. This paper proposes FTS-Stream algorithm based on false-negative approach and time-sensitive sliding window to solve the problem. FTS-Stream uses two bounds to constrain ρ, and adopts a Weighted Harmonic Average(WHA) of the two bounds to replace ρ. Experiments show that the algorithm performs better than many established FTS mining methods.

Key words: Frequent Traversal Sequence(FTS), Weighted Harmonic Average(WHA), regulatory factor

中图分类号: