摘要: 基于False-Positive方法挖掘Web点击流中的频繁访问序列时通过相关比率ρ控制其内存消耗和挖掘精度,两者之间会因ρ产生冲突。针对该问题提出一种基于False-Negative方法和时间敏感滑动窗的算法FTS-Stream,该算法利用2个边界参数约束ρ,采用2个边界的加权调和平均数替代ρ。实验证明该算法相对于同类方法有较好的性能。
关键词:
频繁访问序列,
加权调和平均数,
调节因子
Abstract: When using false-positive approach to mine Frequent Traversal Sequence(FTS) in Web clickstreams, it utilizes relaxation ratio ρ to control memory consumption and precision of mining result. However, such approaches may lead to a conflict between precision and memory consumption because of using ρ. This paper proposes FTS-Stream algorithm based on false-negative approach and time-sensitive sliding window to solve the problem. FTS-Stream uses two bounds to constrain ρ, and adopts a Weighted Harmonic Average(WHA) of the two bounds to replace ρ. Experiments show that the algorithm performs better than many established FTS mining methods.
Key words:
Frequent Traversal Sequence(FTS),
Weighted Harmonic Average(WHA),
regulatory factor
中图分类号:
张啸剑;邵 超;张亚东. 动态Web点击流中频繁访问序列的挖掘[J]. 计算机工程, 2009, 35(14): 58-59.
ZHANG Xiao-jian; SHAO Chao; ZHANG Ya-dong. Mining of Frequent Traversal Sequences in Dynamic Web Clickstreams[J]. Computer Engineering, 2009, 35(14): 58-59.