作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (12): 34-36. doi: 10.3969/j.issn.1000-3428.2008.12.012

• 博士论文 • 上一篇    下一篇

基于序列聚类的事件流数据特征分析

王 勇1,2,王 洁3,王明华4,焦丽梅1   

  1. (1. 国家智能计算机研究开发中心中国科学院计算技术研究所,北京 100080;2. 中国科学院研究生院,北京 100039;3. 首都师范大学信息工程学院,北京100037;4. 国家计算机网络应急技术处理协调中心,北京 100029)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-06-20 发布日期:2008-06-20

Characteristics Analysis of Event Stream Data Based on Sequence Clustering

WANG Yong1,2, WANG Jie3, WANG Ming-hua4, JIAO Li-mei1   

  1. (1. National Research Center for Intelligent Computing System, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080; 2. Graduate University of Chinese Academy of Sciences, Beijing 100039; 3. Information Engineering College, Capital Normal University, Beijing 100037; 4. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-06-20 Published:2008-06-20

摘要: 事件流是近年来兴起的一种对实时进入系统的海量数据进行分析查询的应用,而数据特征是评价系统所需要的负载模型的重要部分。以网络安全监控为背景,提出一种将事件流聚集成时间序列并进行相似性聚类来分析数据特征的方法。通过适当的粒度聚合,将事件流转化成时间序列,选取周期性的时间序列作为代表消除随机干扰,给出基于序列线性相似性的聚类算法。聚类试验表明,具有相似时间特征的事件流可以被有效地聚集到同一类中。

关键词: 数据特征, 时间序列, 聚类, 事件流

Abstract: Event stream is a new kind of analysis application on massive data which enter the system in real-time and data characteristics are important components of workload modeling to evaluate specific system. With background on network security monitoring, it presents an approach of aggregating event stream into time series and charactering data using similarity clustering. Event streams are converted into time series by aggregation of moderate granularity of time, and, the seasonal component of time series is chosen as the representation of original series to avoid random noise. Clustering algorithm of similarity under the transformation of scaling and shifting is presented. Experiment on real data shows that event streams with similar temporal characteristics are clustered into the same cluster efficiently.

Key words: data characteristics, time series, clustering, event stream

中图分类号: