Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2012, Vol. 38 ›› Issue (16): 70-73. doi: 10.3969/j.issn.1000-3428.2012.16.017

• Networks and Communications • Previous Articles     Next Articles

Distributed Data Stream Clustering Algorithm Based on Density Grid

LIN Xiu-dan 1, MAO Guo-jun 2   

  1. (1. College of Computer, Beijing University of Technology, Beijing 100124, China; 2. School of Information, Central University of Finance and Economics, Beijing 100081, China)
  • Received:2011-11-09 Revised:2011-12-08 Online:2012-08-20 Published:2012-08-17

基于密度网格的分布式数据流聚类算法

林秀丹 1,毛国君 2   

  1. (1. 北京工业大学计算机学院,北京 100124;2. 中央财经大学信息学院,北京 100081)
  • 作者简介:林秀丹(1985-),女,硕士研究生,主研方向:数据挖掘;毛国君,教授
  • 基金资助:
    国家自然科学基金资助项目(60873145)

Abstract: A density grid-based clustering algorithm is proposed, which is suitable for the distributed data stream environment. This algorithm updates the data streams quickly and reflects the change of data streams by grid space in local sites. Center site is responsible for collecting and merging the grid structures of all local sites. Then algorithm clusters and optimizes on the global grid structure to generate the global clustering pattern. Experimental results show that the algorithm can reduce network traffic and achieve higher global clustering qualities.

Key words: distributed data stream, density grid, clustering, noise, sliding window, incremental update

摘要: 提出一种适用于分布式数据流环境的、基于密度网格的聚类算法。利用局部站点快速更新数据流信息,使网格空间反映当前数据流的变化。中心站点负责在接收及合并局部网格结构后,对全局网格结构进行密度网格聚类以及噪声网格优化,形成全局聚类结果。实验结果表明,该算法能减少网络通信量,提高全局聚类精度。

关键词: 分布式数据流, 密度网格, 聚类, 噪声, 滑动窗口, 增量式更新

CLC Number: