Abstract:
Event stream is a new kind of analysis application on massive data which enter the system in real-time and data characteristics are important components of workload modeling to evaluate specific system. With background on network security monitoring, it presents an approach of aggregating event stream into time series and charactering data using similarity clustering. Event streams are converted into time series by aggregation of moderate granularity of time, and, the seasonal component of time series is chosen as the representation of original series to avoid random noise. Clustering algorithm of similarity under the transformation of scaling and shifting is presented. Experiment on real data shows that event streams with similar temporal characteristics are clustered into the same cluster efficiently.
Key words:
data characteristics,
time series,
clustering,
event stream
摘要: 事件流是近年来兴起的一种对实时进入系统的海量数据进行分析查询的应用,而数据特征是评价系统所需要的负载模型的重要部分。以网络安全监控为背景,提出一种将事件流聚集成时间序列并进行相似性聚类来分析数据特征的方法。通过适当的粒度聚合,将事件流转化成时间序列,选取周期性的时间序列作为代表消除随机干扰,给出基于序列线性相似性的聚类算法。聚类试验表明,具有相似时间特征的事件流可以被有效地聚集到同一类中。
关键词:
数据特征,
时间序列,
聚类,
事件流
CLC Number:
WANG Yong; WANG Jie; WANG Ming-hua; JIAO Li-mei. Characteristics Analysis of Event Stream Data Based on Sequence Clustering[J]. Computer Engineering, 2008, 34(12): 34-36.
王 勇;王 洁;王明华;焦丽梅. 基于序列聚类的事件流数据特征分析[J]. 计算机工程, 2008, 34(12): 34-36.