摘要: 提出一种基于网格的带有参考参数的聚类算法,通过密度阈值数组的计算,为用户提供有效的参考参数,不但能满足一般的聚类要求,而且还能将高密度的聚类从低密度的聚类中分离出来,解决了传统网格聚类算法在划分网格时很少考虑数据分布导致聚类质量降低的问题。实验仿真表明,该算法能有效处理任意形状和大小的聚类,很好地识别出孤立点或噪声,并且有较好的精度。
关键词:
网格,
密度阈值,
聚类算法,
数据挖掘
Abstract: By means of calculating density threshold data, some effective referential parameters are worked out and provided for users, and a new kind of clustering algorithm called GRPC is presented. With the help of these referential parameters, it not only can cluster general data but also segregate high-density clusters from low-density clusters. The problem of lower quality of clusters of using traditional grid clustering algorithm is solved when the distribution of data on partitioning grid is usually ignored. Experimental results confirm that this new algorithm can differentiate between outliers or noises and discover clusters of arbitrary shapes, with good clustering quality.
Key words:
grid,
density threshold,
clustering algorithm,
data mining
中图分类号:
周炎涛;易兴东;吴正国. 基于网格的带有参考参数的聚类算法[J]. 计算机工程, 2008, 34(9): 98-100.
ZHOU Yan-tao; YI Xing-dong ; WU Zheng-guo. Grid-based Clustering Algorithm with Referential Parameters[J]. Computer Engineering, 2008, 34(9): 98-100.