Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

K-means Clustering Ensemble Based on MapReduce

JI Su-qin, SHI Hong-bo   

  1. (School of Information Management, Shanxi University of Finance & Economics, Taiyuan 030031, China)
  • Received:2012-08-27 Online:2013-09-15 Published:2013-09-13

基于MapReduce的K-means聚类集成

冀素琴,石洪波   

  1. (山西财经大学信息管理学院,太原 030031)
  • 作者简介:冀素琴(1972-),女,讲师、硕士、CCF会员,主研方向:数据挖掘,分布式技术;石洪波,教授、博士
  • 基金资助:
    国家自然科学基金资助项目(60873100);山西省自然科学基金资助项目(2010011022-1)

Abstract: Aiming at the problem of the clustering analysis on massive data for traditional clustering algorithm, this paper proposes a K-means clustering ensemble algorithm based on MapReduce. It generates component clustering results with different number of cluster by the K-means algorithm, improves co-association matrix, and gets a final result automatically via the number of times sample pair co-occurred. Experimental results show that this algorithm can effectively improve the quality of clustering, and has good scalability, fits to clustering analysis on massive data.

Key words: massive dada, clustering, MapReduce framework, K-means algorithm, co-association matrix, clustering ensemble

摘要: 针对传统聚类算法难以高效进行海量数据聚类分析的问题,提出一种基于MapReduce框架的K-means聚类集成算法。利用K-means算法生成不同聚簇数目的基聚类结果,改进共协关系矩阵,依据数据点对出现次数进行集成,自动得出最终聚类结果。实验结果表明,该算法能够有效地改善聚类质量,具有良好的扩展性,适用于海量数据的聚类分析。

关键词: 海量数据, 聚类, MapReduce框架, K-means算法, 共协关系矩阵, 聚类集成

CLC Number: