作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (01): 55-58,61. doi: 10.3969/j.issn.1000-3428.2012.01.014

• 软件技术与数据库 • 上一篇    下一篇

界标窗口中数据流频繁模式挖掘算法研究

张广路1,雷景生2,吴兴惠1   

  1. (1. 海南师范大学数学与统计学院,海口 571158;2. 南京邮电大学计算机学院,南京 210046)
  • 收稿日期:2011-06-28 出版日期:2012-01-05 发布日期:2012-01-05
  • 作者简介:张广路(1978-),女,讲师,主研方向:数据库技术,数据挖掘,模糊信息系统;雷景生,教授、博士;吴兴惠,讲师、硕士
  • 基金资助:
    海南省自然科学基金资助项目(610221, 109002, 808155);海南师范大学青年科研基金资助项目(QN0923)

Research on Data Stream Frequent Pattern Mining Algorithm in Landmark Window

ZHANG Guang-lu 1, LEI Jing-sheng 2, WU Xing-hui 1   

  1. (1. School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China; 2. School of Computer Science & Technology, Nanjing University of Posts and Telecommunications, Nanjing 210046, China)
  • Received:2011-06-28 Online:2012-01-05 Published:2012-01-05

摘要: 数据流的流量太大会无法被整个存储,或被多次扫描。为此,在研究已有挖掘算法的基础上,提出一种界标窗口中数据流频繁模式挖掘算法DSMFP_LW。利用扩展前缀模式树存储全局临界频繁模式,实现单遍扫描数据流和数据增量更新。实验结果表明,与Lossy Counting算法相比,DSMFP_LW算法具有更好的时空效率。

关键词: 界标窗口, 频繁模式, 数据流, DSMFP_LW算法, 滑动窗口

Abstract: For data traffic flow is too large to store the entire data stream or on its scan times and other issues, through the research of algorithms on mining frequent patterns that are proposed, this paper proposes an algorithm on mining frequent patterns over data stream based on Landmark window, named DSMFP_LW. DSMFP_LW has major features as follows: namely single streaming data scan for counting pattern’s information, extended prefix-tree-based compact pattern representation, and incremental update of data. Experimental results show that DSMFP_LW algorithm has better utilization of time and space efficiency. In addition, it outperforms the well-known algorithm Lossy Counting in the same streaming environment.

Key words: landmark window, frequent pattern, data stream, DSMFP_LW algorithm, sliding window

中图分类号: