计算机工程 ›› 2011, Vol. 37 ›› Issue (16): 24-26.doi: 10.3969/j.issn.1000-3428.2011.16.008

• 博士论文 • 上一篇    下一篇

两类频繁项算法在网络流上的适用性评估

周 骏 1,2,陈 鸣 1,张佳明 3   

  1. (1. 解放军理工大学指挥自动化学院,南京 210007;2. 第二炮兵后勤部自动化工作站,北京 100085; 3. 中国人民解放军96627部队,北京 100085)
  • 收稿日期:2010-12-07 出版日期:2011-08-20 发布日期:2011-08-20
  • 作者简介:周 骏(1975-),男,工程师、博士研究生,主研方向:网络异常流量检测;陈 鸣,教授、博士;张佳明,工程师
  • 基金项目:

    国家“863”计划基金资助项目(2007AA01Z418)

Applicability Evaluation on Two Classes of Frequent Items Algorithms in NetFlow

ZHOU Jun 1,2, CHEN Ming 1, ZHANG Jia-ming 3   

  1. (1. Institute of Command Automation, PLA University of Science and Technology, Nanjing 210007, China; 2. Institution of Command Automation for Logistics Department, Second Artillery Corps, Beijing 100085, China; 3. Unit 96627, PLA, Beijing 100085, China)
  • Received:2010-12-07 Online:2011-08-20 Published:2011-08-20

摘要: 通过建立基于分组俘获文件产生网络流的模拟环境,对计数型算法和略图算法两类经典的频繁项挖掘算法的适用性进行验证,检验采用界标窗口查询模式的效果。实验结果表明,算法查全率不低于98%,与查准率和查询精度的相关性弱,与数据项的规模及数据流中频繁项分布的相关性强。略图类算法对频率的估算误差比较稳定,计数型算法则偏大。计数型算法的执行效率明显优于略图类算法。

关键词: 数据流, 频繁项, 基于计数的算法, 基于略图的算法, 网络流, 适用性

Abstract: Established a trace-file based NetFlow simulation environment to evaluating the applicability of two classes of classic algorithms for finding frequent items in NetFlow. The queries are based on landmark window in experiment. Experimental results indicate that the recall of all algorithms no less than ninety-eight percent, and the correlation between precision and query granularity is weak, but the correlation between precision and amount of items is strong, the correlation between precision and distribution of frequent items is strong. The error of sketch-based algorithm is stable for frequency estimation, and it is better than count-based algorithms. Meanwhile, the performance of count-based algorithms is better than that of sketch-based algorithms.

Key words: data stream, frequent items, count-based algorithm, sketch-based algorithm, NetFlow, applicability

中图分类号: