摘要: 聚类算法的好坏直接影响聚类的效果。该文讨论了经典的k-平均聚类算法,说明了它存在不能很好地处理符号数据和对噪声与孤立点数据敏感等不足,提出了一种基于加权改进的k-平均聚类算法,克服了k-平均聚类算法的缺点,并从理论上分析了该算法的复杂度。实验证明,用该方法实现的数据聚类与传统的基于平均值的方法相比较,能有效提高数据聚类效果。
关键词:
聚类算法,
k-平均,
权,
聚类数据挖掘
Abstract: The method of data clustering will influence the effect of clustering directly. The algorithm of k-means is discussed, the shortages of this algorithm such as it can not deal with symbolic data and it is sensitive for data of isolation point and noise are demonstrated. A modified k-means clustering algorithm based on weights is put forward, it changes the shortcomings of k-means. Its complexity is analyzed from theoretical. The experiments show that, compared with traditional method based on means, the modified data clustering algorithm can improve the efficiency of data clustering.
Key words:
cluster algorithm,
k-means,
weights,
cluster data mining
中图分类号:
孙士保;秦克云. 改进的k-平均聚类算法研究[J]. 计算机工程, 2007, 33(13): 200-201,.
SUN Shibao; QIN Keyun. Research on Modified k-means Data Cluster Algorithm[J]. Computer Engineering, 2007, 33(13): 200-201,.