计算机工程 ›› 2012, Vol. 38 ›› Issue (16): 70-73.doi: 10.3969/j.issn.1000-3428.2012.16.017

• 软件技术与数据库 • 上一篇    下一篇

基于密度网格的分布式数据流聚类算法

林秀丹 1,毛国君 2   

  1. (1. 北京工业大学计算机学院,北京 100124;2. 中央财经大学信息学院,北京 100081)
  • 收稿日期:2011-11-09 修回日期:2011-12-08 出版日期:2012-08-20 发布日期:2012-08-17
  • 作者简介:林秀丹(1985-),女,硕士研究生,主研方向:数据挖掘;毛国君,教授
  • 基金项目:
    国家自然科学基金资助项目(60873145)

Distributed Data Stream Clustering Algorithm Based on Density Grid

LIN Xiu-dan 1, MAO Guo-jun 2   

  1. (1. College of Computer, Beijing University of Technology, Beijing 100124, China; 2. School of Information, Central University of Finance and Economics, Beijing 100081, China)
  • Received:2011-11-09 Revised:2011-12-08 Online:2012-08-20 Published:2012-08-17

摘要: 提出一种适用于分布式数据流环境的、基于密度网格的聚类算法。利用局部站点快速更新数据流信息,使网格空间反映当前数据流的变化。中心站点负责在接收及合并局部网格结构后,对全局网格结构进行密度网格聚类以及噪声网格优化,形成全局聚类结果。实验结果表明,该算法能减少网络通信量,提高全局聚类精度。

关键词: 分布式数据流, 密度网格, 聚类, 噪声, 滑动窗口, 增量式更新

Abstract: A density grid-based clustering algorithm is proposed, which is suitable for the distributed data stream environment. This algorithm updates the data streams quickly and reflects the change of data streams by grid space in local sites. Center site is responsible for collecting and merging the grid structures of all local sites. Then algorithm clusters and optimizes on the global grid structure to generate the global clustering pattern. Experimental results show that the algorithm can reduce network traffic and achieve higher global clustering qualities.

Key words: distributed data stream, density grid, clustering, noise, sliding window, incremental update

中图分类号: