作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

最小方差优化初始聚类中心的K-means算法

谢娟英,王艳娥   

  1. (陕西师范大学计算机科学学院,西安 710062)
  • 收稿日期:2013-05-10 出版日期:2014-08-15 发布日期:2014-08-15
  • 作者简介:谢娟英(1971-),女,副教授、博士,主研方向:机器学习,数据挖掘;王艳娥,硕士研究生。 
  • 基金资助:
    国家自然科学基金资助项目(31372250);陕西省科技攻关计划基金资助项目(2013K12-03-24);中央高校基本科研业务费专项基金资助项目(GK201102007)。

K-means Algorithm Based on Minimum Deviation Initialized Clustering Centers

XIE Juan-ying,WANG Yan-e   

  1. (School of Computer Science,Shaanxi Normal University,Xi’an 710062,China)
  • Received:2013-05-10 Online:2014-08-15 Published:2014-08-15

摘要: 传统K-means算法随机选取初始聚类中心,容易导致聚类结果不稳定,而优化初始聚类中心的K-means算法需要一定的参数选择,也会使聚类结果缺乏客观性。为此,根据样本空间分布紧密度信息,提出利用最小方差优化初始聚类中心的K-means算法。该算法运用样本空间分布信息,通过计算样本空间分布的方差得到样本紧密度信息,选择方差最小(即紧密度最高)且相距一定距离的样本作为初始聚类中心,实现优化的K-means聚类。在UCI机器学习数据库数据集和含有噪音的人工模拟数据集上的实验结果表明,该算法不仅能得到较好的聚类结果,且聚类结果稳定,对噪音具有较强的免疫性能。

关键词: 聚类, K-means算法, 方差, 紧密度, 初始聚类中心

Abstract: To overcome the deficiencies of traditional K-means algorithm whose clustering is dependent on the seeds chosen randomly and of the improved K-means algorithms whose clustering are unstable for the parameters selected arbitrarily,a novel K-means clustering algorithm is proposed in this paper.This new K-means algorithm adopts the pattern information of exemplars in a dataset,and computes the deviation for each sample.It uses the well known principle that the deviation of a sample addresses the intensive of exemplars around it.The less the deviation is,the more exemplars are intensively gathered around the related sample.The proposed K-means algorithm chooses the first K samples with the minimum deviation and far away from each other as the initial cluster centers to improve the performance of it.The proposed K-means algorithm is tested on UCI data sets and on synthetic datasets with some proportional noises.The experimental results demonstrate that the proposed novel K-means algorithm not only can achieve a very promising and stable clustering,but also get the immune property with noises in its clustering.

Key words: clustering, K-means algorithm, deviation, intensive degree, initialized clustering centers

中图分类号: