计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于区域极值点的时间序列聚类算法

孙 雅a,c,李志华a,b,c   

  1. (江南大学a. 物联网工程学院轻工过程先进控制教育部重点实验室; b. 物联网技术应用教育部工程研究中心; c. 物联网工程学院计算机科学与技术系,江苏无锡214122)
  • 收稿日期:2014-06-05 出版日期:2015-05-15 发布日期:2015-05-15
  • 作者简介:孙 雅(1990 - ),女,硕士研究生,主研方向:物联网技术,数据挖掘;李志华,副教授、博士。
  • 基金项目:
    中央高校基本科研业务费专项基金资助项目(JUSRP211A41);江苏省产学研前瞻基金资助项目(BY2013015-23)。

Clustering Algorithm for Time Series Based on Locally Extreme Point

SUN Ya  a,c ,LI Zhihua  a,b,c   

  1. (a. Key Laboratory of Advanced Process Control for Light Industry,Ministry of Education, College of Internet of Things Engineering; b. Engineering Research Center of Internet of Things Technology Application,Ministry of Education; c. Department of Computer Science and Technology,College of Internet of Things Engineering,Jiangnan University,Wuxi 214122,China)
  • Received:2014-06-05 Online:2015-05-15 Published:2015-05-15

摘要: 相异性或相似性度量是数据挖掘领域中的2 个基本问题。针对时间序列的相异性度量问题,给出时间序 列的区域半径、区域极值点、区域等定义,提出一种区域极值点提取策略。通过提取有代表性的极值点以起到对时 间序列数据约简和压缩的作用,进一步定义时间序列的动态时间弯曲距离度量其相异性。以此为基础提出一种新 的时间序列层次聚类算法。仿真实验结果表明,与时间序列趋势特征提取等算法相比,该算法在数据的压缩效果 和聚类准确率方面均有明显提高。

关键词: 时间序列, 区域极值点, 重描述, 数据压缩, 相似性度量, 层次聚类

Abstract: Dissimilarity or similarity is the key issue in data mining. data is hard to measure because of its original structure. Aiming at the problem of time series similarity measure,this paper proposes a re-description method based on locally extreme point of time series. In which,the original time series is described by extracting the locally extreme points from time series,reflecting the main features of the time series effectively and achieving the compression of time series data. Measuring the extreme series after equal-length treatment enhances the flexibility of the algorithm,and reduces its limitations. Based on the above,it is applied to hierarchical clustering of the time series. Simulation experimental results show that the clustering effect and data compression is obvious,and the clustering accuracy greatly improves compared with other algorithms based on time series trend features extraction.

Key words: time series, locally extreme point, re-description, data compression, similarity measure, hierarchical clustering

中图分类号: