作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (03): 65-66. doi: 10.3969/j.issn.1000-3428.2007.03.024

• 软件技术与数据库 • 上一篇    下一篇

初始聚类中心优化的k-means算法

袁 方,周志勇,宋 鑫   

  1. (河北大学数学与计算机学院,保定 071002)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-02-05 发布日期:2007-02-05

K-means Clustering Algorithm with Meliorated Initial Center

YUAN Fang, ZHOU Zhiyong, SONG Xin   

  1. (College of Mathematics and Computer, Hebei University, Baoding 071002)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-02-05 Published:2007-02-05

摘要: 传统的k-means算法对初始聚类中心敏感,聚类结果随不同的初始输入而波动。为消除这种敏感性,提出一种优化初始聚类中心的方法,此方法计算每个数据对象所在区域的密度,选择相互距离最远的k个处于高密度区域的点作为初始聚类中心。实验表明改进后的k-means算法能产生质量较高的聚类结果,并且消除了对初始输入的敏感性。

关键词: 数据挖掘, 聚类, k-means算法, 聚类中心

Abstract: The traditional k-means algorithm has sensitivity to the initial start center. To solve this problem, a new method is proposed to find the initial start center. First it computes the density of the area where the data object belongs to; then finds k data objects all of which are belong to high density area and the most far away to each other, using these k data objects as the initial start centers. Experiments on the standard database UCI show that the proposed method can produce a high purity clustering result and eliminate the sensitivity to the initial start centers.

Key words: Data mining, Clustering, K-means algorithm, Clustering center