作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2014, Vol. 40 ›› Issue (12): 188-194. doi: 10.3969/j.issn.1000-3428.2014.12.035

• 人工智能及识别技术 • 上一篇    下一篇

基于扩展网格和密度的数据流聚类算法

邢长征,王晓旭   

  1. 辽宁工程技术大学电子与信息工程学院,辽宁 葫芦岛125105
  • 收稿日期:2013-12-24 修回日期:2014-03-05 出版日期:2014-12-15 发布日期:2015-01-16
  • 作者简介:邢长征(1967-),男,教授、博士,主研方向:人工智能,数据挖掘;王晓旭,硕士研究生。

Data Stream Clustering Algorithm Based on Extensible Grid and Density

XING Changzheng,WANG Xiaoxu   

  1. College of Electronics and Information Engineering, Liaoning Technical University,Huludao 125105,China
  • Received:2013-12-24 Revised:2014-03-05 Online:2014-12-15 Published:2015-01-16

摘要: 针对现有聚类算法在计算网格密度时未考虑周围空间的影响因素而导致聚类边界不平滑的现象,提出一种基于扩展网格和密度的数据流聚类算法。通过动态确定网格扩展区域,将网格密度计算范围从本网格合理地扩展到相邻网格空间,进而根据算法中引入的凝聚度衡量周围空间数据点对网格密度的影响。为进一步精确聚类边缘的轮廓分布情况,使用边界点距离阈值函数从噪声中分离出类的边界点,并给出一种改进的网格合并方法,根据簇间连通性简化网格簇合并的判断条件,有效减少算法执行时间。实验结果表明,该算法具有较高的聚类质量和聚类效率。

关键词: 聚类, 扩展网格, 网格密度, 凝聚度, 连通性, 边界点

Abstract: With regard to the previous phenomenon,when traditional clustering algorithms compute grid density without considering the surrounding space which leads to the unsmoothed clustering,this paper presents a data stream clustering algorithm based on extensible grid and density.By dynamically determining the grid expansion area,the algorithm reasonably expands the calculation range of grid density from this grid to the adjacent ones,and then according to the cohesion degree which is introduced from algorithm to measure the impact of surrounding data on grid density.In order to further outline the distribution of the clustering edges,the algorithm uses the boundary threshold value method which separates the boundary points from the noise.Furthermore,the algorithm puts forward an improved grid combining method which is on the basis of the judgment of inter-cluster connectivity to simplify the combination of grid clusters,and this effectively reduces the execution time of the algorithm.Experimental results show that the algorithm has higher clustering quality and efficiency.

Key words: clustering, extensible grid, grid density, cohesion degree, connectivity, boundary point

中图分类号: