计算机工程 ›› 2008, Vol. 34 ›› Issue (9): 98-100.doi: 10.3969/j.issn.1000-3428.2008.09.035

• 软件技术与数据库 • 上一篇    下一篇

基于网格的带有参考参数的聚类算法

周炎涛1,2,易兴东1,吴正国2   

  1. (1. 湖南大学计算机与通信学院,长沙 410082;2. 海军工程大学信息与电气学院,武汉 430033)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-05-05 发布日期:2008-05-05

Grid-based Clustering Algorithm with Referential Parameters

ZHOU Yan-tao1,2, YI Xing-dong 1, WU Zheng-guo2   

  1. (1. School of Computer and Communication, Hunan University, Changsha 410082; 2. College of Information and Electrical Engineering, Naval Engineering University, Wuhan 430033)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-05-05 Published:2008-05-05

摘要: 提出一种基于网格的带有参考参数的聚类算法,通过密度阈值数组的计算,为用户提供有效的参考参数,不但能满足一般的聚类要求,而且还能将高密度的聚类从低密度的聚类中分离出来,解决了传统网格聚类算法在划分网格时很少考虑数据分布导致聚类质量降低的问题。实验仿真表明,该算法能有效处理任意形状和大小的聚类,很好地识别出孤立点或噪声,并且有较好的精度。

关键词: 网格, 密度阈值, 聚类算法, 数据挖掘

Abstract: By means of calculating density threshold data, some effective referential parameters are worked out and provided for users, and a new kind of clustering algorithm called GRPC is presented. With the help of these referential parameters, it not only can cluster general data but also segregate high-density clusters from low-density clusters. The problem of lower quality of clusters of using traditional grid clustering algorithm is solved when the distribution of data on partitioning grid is usually ignored. Experimental results confirm that this new algorithm can differentiate between outliers or noises and discover clusters of arbitrary shapes, with good clustering quality.

Key words: grid, density threshold, clustering algorithm, data mining

中图分类号: