最小方差优化初始聚类中心的K-means算法

doi:10.3969/j.issn.1000-3428.2014.08.039

计算机工程

最小方差优化初始聚类中心的K-means算法

谢娟英,王艳娥

(陕西师范大学计算机科学学院,西安 710062)

收稿日期:2013-05-10 出版日期:2014-08-15 发布日期:2014-08-15
作者简介:谢娟英(1971－)，女，副教授、博士，主研方向：机器学习，数据挖掘；王艳娥，硕士研究生。 
基金资助:
国家自然科学基金资助项目(31372250)；陕西省科技攻关计划基金资助项目(2013K12-03-24)；中央高校基本科研业务费专项基金资助项目(GK201102007)。

K-means Algorithm Based on Minimum Deviation Initialized Clustering Centers

XIE Juan-ying,WANG Yan-e

(School of Computer Science,Shaanxi Normal University,Xi’an 710062,China)

Received:2013-05-10 Online:2014-08-15 Published:2014-08-15

摘要/Abstract

摘要： 传统K-means算法随机选取初始聚类中心,容易导致聚类结果不稳定,而优化初始聚类中心的K-means算法需要一定的参数选择,也会使聚类结果缺乏客观性。为此,根据样本空间分布紧密度信息,提出利用最小方差优化初始聚类中心的K-means算法。该算法运用样本空间分布信息,通过计算样本空间分布的方差得到样本紧密度信息,选择方差最小(即紧密度最高)且相距一定距离的样本作为初始聚类中心,实现优化的K-means聚类。在UCI机器学习数据库数据集和含有噪音的人工模拟数据集上的实验结果表明,该算法不仅能得到较好的聚类结果,且聚类结果稳定,对噪音具有较强的免疫性能。

关键词: 聚类, K-means算法, 方差, 紧密度, 初始聚类中心

Abstract: To overcome the deficiencies of traditional K-means algorithm whose clustering is dependent on the seeds chosen randomly and of the improved K-means algorithms whose clustering are unstable for the parameters selected arbitrarily,a novel K-means clustering algorithm is proposed in this paper.This new K-means algorithm adopts the pattern information of exemplars in a dataset,and computes the deviation for each sample.It uses the well known principle that the deviation of a sample addresses the intensive of exemplars around it.The less the deviation is,the more exemplars are intensively gathered around the related sample.The proposed K-means algorithm chooses the first K samples with the minimum deviation and far away from each other as the initial cluster centers to improve the performance of it.The proposed K-means algorithm is tested on UCI data sets and on synthetic datasets with some proportional noises.The experimental results demonstrate that the proposed novel K-means algorithm not only can achieve a very promising and stable clustering,but also get the immune property with noises in its clustering.

Key words: clustering, K-means algorithm, deviation, intensive degree, initialized clustering centers

中图分类号:

TP18

谢娟英,王艳娥. 最小方差优化初始聚类中心的K-means算法[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2014.08.039.

XIE Juan-ying,WANG Yan-e. K-means Algorithm Based on Minimum Deviation Initialized Clustering Centers[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2014.08.039.

http://www.ecice06.com/CN/Y2014/V40/I8/205

参考文献

［1］ Han Jiawei,Kamber M.Data Mining:Concepts and Techniques［M］.2nd ed.Beijing,China:China Machine Press,2011.  ［2］孙吉贵,刘杰,赵连宇.聚类算法研究［J］.软件学报,2008,19(1):48-61.  ［3］ Pena J M,Lozano J A,Larranaga P.An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm［J］.Pattern Recognition Letters,1999,20(10):1027-1040.  ［4］ Vance F.Clustering and the Continuous K-Means Algorithm［J］.Los Alamos Science,1994,22:138-134.  ［5］ Jain A K,Murty M N,Flynn P J.Data Clustering:A Review［J］.ACM Computing Survey,1999,31(3):264-323.  ［6］ Kaufman L,Rousseeuw P J.Finding Groups in Data:An Introduction to Cluster Analysis［M］.New York,USA:John Wiley & Sons,Inc.,1990.  ［7］ Dhillon I S,Guan Yuqiang,Kogan J.Refining Clusters in High Dimensional Text Data［C］//Proceedings of the 2nd SIAM Workshop on Clustering High Dimensional Data.Arlington,USA:［s.n.］,2002:59-66.  ［8］ Khan S S,Ahmad A.Cluster Center Initialization for K-means Clustering［J］.Pattern Recognition Letters,2004,25(11):1293-1302. ［9］ Deelers S,Auwatanamongkol S.Enhancing K-means Algorithm with Initial Cluster Centers Derived from Data Partitioning Along the Data Axis with the Highest Variance［J］.Proceedings of World Academy of Science,Engineering and Technology,2007,26:323-328.  ［10］钱线,黄萱菁,吴立德.初始化K-means的谱方法［J］.自动化学报,2007,33(4):342-346.  ［11］袁方,周志勇,宋鑫.初始聚类中心优化的K-means算法［J］.计算机工程,2007,33(3):65-66.  ［12］赖玉霞,刘建平.K-means算法的初始聚类中心的优化［J］.计算机工程与应用,2008,44(10):147-149.  ［13］王塞芳,戴芳,王万斌,等.基于初始聚类中心优化的K-均值算法［J］.计算机工程与科学,2010,32(10):105-107.  ［14］韩凌波,王强,蒋正峰,等.一种改进的K-means初始聚类中心选取算法［J］.计算机工程与应用,2010,46(17):150-152.  ［15］汪中,刘贵全,陈恩红.一种优化初始中心点的K-means算法［J］.模式识别与人工智能,2009,22(2):299-304.  ［16］谢娟英,郭文娟,谢维信,等.基于样本空间分布密度的初始聚类中心优化K-均值算法［J］.计算机应用研究,2012,29(3):888-892.  ［17］谢娟英,郭文娟,谢维信,等.基于密度RPCL的K-means算法［J］.西北大学学报,2012,32(3):646-650.  ［18］米源,杨燕,李天瑞.基于密度网格的数据流聚类算法［J］.计算机科学,2011,38(12):178-181.  ［19］盛骤,谢式千,潘承毅,等.概率论与数理统计［M］.2版.北京:高等教育出版社,1997.  ［20］张惟皎,刘春煌,李芳玉.聚类质量的评价方法［J］.计算机工程,2005,31(20):10-12.  ［21］于剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围［J］.中国科学:E辑,2002,32(2):274-280.  ［22］杨燕,靳蕃,Kamel M.聚类有效性评价综述［J］.计算机应用研究,2008,41(6):1631-1632.  ［23］ Hubert L,Arabie P.Comparing Partitions［J］.Journal of Classification,1985,2:193-218.  ［24］ Vinh N X,Epps J,Nailey J.Information Theoretic Measures for Clustering Comparison:Is a Correction for Chance Necessary?［C］//Proceedings of the 26th International Conference on Machine Learning.Montreal,Canada:［s.n.］,2009：1073-1080.   ［25］ Frank A,Asuncion A.UCI Machine Learning Repository［D］.Irvine,USA:School of Information and Computer Science,University of California,2010. 编辑金胡考

[1]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[2]	郑美光, 杨泳. 基于互信息软聚类的个性化联邦学习算法[J]. 计算机工程, 2023, 49(8): 20-28.
[3]	李泽水, 冀俊忠, 杨翠翠. 基于边权重信息深度网络嵌入的PPIN功能模块检测[J]. 计算机工程, 2023, 49(8): 69-76.
[4]	邱天晨, 郑小盈, 祝永新, 封松林. 面向非独立同分布数据的联邦学习架构[J]. 计算机工程, 2023, 49(7): 110-117.
[5]	高小方, 原玉梁, 温静, 白雪飞. 面向相交多流形聚类的标签传播算法[J]. 计算机工程, 2023, 49(6): 90-98.
[6]	位雅, 张正军, 何凯琳, 唐莉. 基于相对密度的密度峰值聚类算法[J]. 计算机工程, 2023, 49(6): 53-61.
[7]	戴浩磊, 黄永慧, 周郭许. 基于超图正则化非负张量链分解的聚类分析[J]. 计算机工程, 2023, 49(6): 81-89.
[8]	李晓腾, 张盼盼, 勾智楠, 高凯. 基于多任务学习的多模态命名实体识别方法[J]. 计算机工程, 2023, 49(4): 114-119.
[9]	程小辉, 李钰, 康燕萍. 基于中间图特征提取的卷积网络双标准剪枝[J]. 计算机工程, 2023, 49(3): 105-112.
[10]	袁立宁, 胡皓, 刘钊. 基于多通道图卷积自编码器的图表示学习[J]. 计算机工程, 2023, 49(2): 150-160,174.
[11]	蔡瑞初, 伍运金, 陈薇, 郝志峰. 面向多元时间序列的群体因果关系发现算法[J]. 计算机工程, 2023, 49(2): 127-135.
[12]	胡慧旗, 张维强, 徐晨. 判别性增强的稀疏子空间聚类[J]. 计算机工程, 2023, 49(2): 98-104.
[13]	李林珂, 康昭, 龙波. 基于黎曼流形的多视角谱聚类算法[J]. 计算机工程, 2023, 49(1): 113-120,129.
[14]	孙扬威, 戚湧. 基于聚类混合采样与PSO-Stacking的车载CAN入侵检测方法[J]. 计算机工程, 2023, 49(1): 138-145.
[15]	李海林, 夏燕燕, 邹金串. 基于CPET时序聚类的中长跑耐力运动员选拔方法[J]. 计算机工程, 2022, 48(9): 262-268.

选择文件类型/文献管理软件名称

选择包含的内容

最小方差优化初始聚类中心的K-means算法

K-means Algorithm Based on Minimum Deviation Initialized Clustering Centers

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

最小方差优化初始聚类中心的K-means算法

K-means Algorithm Based on Minimum Deviation Initialized Clustering Centers

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价