摘要: 数据流的流量太大会无法被整个存储,或被多次扫描。为此,在研究已有挖掘算法的基础上,提出一种界标窗口中数据流频繁模式挖掘算法DSMFP_LW。利用扩展前缀模式树存储全局临界频繁模式,实现单遍扫描数据流和数据增量更新。实验结果表明,与Lossy Counting算法相比,DSMFP_LW算法具有更好的时空效率。
关键词:
界标窗口,
频繁模式,
数据流,
DSMFP_LW算法,
滑动窗口
Abstract: For data traffic flow is too large to store the entire data stream or on its scan times and other issues, through the research of algorithms on mining frequent patterns that are proposed, this paper proposes an algorithm on mining frequent patterns over data stream based on Landmark window, named DSMFP_LW. DSMFP_LW has major features as follows: namely single streaming data scan for counting pattern’s information, extended prefix-tree-based compact pattern representation, and incremental update of data. Experimental results show that DSMFP_LW algorithm has better utilization of time and space efficiency. In addition, it outperforms the well-known algorithm Lossy Counting in the same streaming environment.
Key words:
landmark window,
frequent pattern,
data stream,
DSMFP_LW algorithm,
sliding window
中图分类号:
张广路, 雷景生, 吴兴惠. 界标窗口中数据流频繁模式挖掘算法研究[J]. 计算机工程, 2012, 38(01): 55-58,61.
ZHANG An-Lu, LEI Jing-Sheng, TUN Xin-Hui. Research on Data Stream Frequent Pattern Mining Algorithm in Landmark Window[J]. Computer Engineering, 2012, 38(01): 55-58,61.