Abstract:
The traditional k-means algorithm has sensitivity to the initial start center. To solve this problem, a new method is proposed to find the initial start center. First it computes the density of the area where the data object belongs to; then finds k data objects all of which are belong to high density area and the most far away to each other, using these k data objects as the initial start centers. Experiments on the standard database UCI show that the proposed method can produce a high purity clustering result and eliminate the sensitivity to the initial start centers.
Key words:
Data mining,
Clustering,
K-means algorithm,
Clustering center
摘要: 传统的k-means算法对初始聚类中心敏感,聚类结果随不同的初始输入而波动。为消除这种敏感性,提出一种优化初始聚类中心的方法,此方法计算每个数据对象所在区域的密度,选择相互距离最远的k个处于高密度区域的点作为初始聚类中心。实验表明改进后的k-means算法能产生质量较高的聚类结果,并且消除了对初始输入的敏感性。
关键词:
数据挖掘,
聚类,
k-means算法,
聚类中心
YUAN Fang; ZHOU Zhiyong; SONG Xin. K-means Clustering Algorithm with Meliorated Initial Center[J]. Computer Engineering, 2007, 33(03): 65-66.
袁 方;周志勇;宋 鑫. 初始聚类中心优化的k-means算法[J]. 计算机工程, 2007, 33(03): 65-66.