K-means Clustering Ensemble Based on MapReduce

doi:10.3969/j.issn.1000-3428.2013.09.018

Computer Engineering

Previous Articles Next Articles

K-means Clustering Ensemble Based on MapReduce

JI Su-qin, SHI Hong-bo

(School of Information Management, Shanxi University of Finance & Economics, Taiyuan 030031, China)

Received:2012-08-27 Online:2013-09-15 Published:2013-09-13

基于MapReduce的K-means聚类集成

冀素琴，石洪波

(山西财经大学信息管理学院，太原 030031)

作者简介:冀素琴(1972－)，女，讲师、硕士、CCF会员，主研方向：数据挖掘，分布式技术；石洪波，教授、博士
基金资助:
国家自然科学基金资助项目(60873100)；山西省自然科学基金资助项目(2010011022-1)

Abstract

Abstract: Aiming at the problem of the clustering analysis on massive data for traditional clustering algorithm, this paper proposes a K-means clustering ensemble algorithm based on MapReduce. It generates component clustering results with different number of cluster by the K-means algorithm, improves co-association matrix, and gets a final result automatically via the number of times sample pair co-occurred. Experimental results show that this algorithm can effectively improve the quality of clustering, and has good scalability, fits to clustering analysis on massive data.

Key words: massive dada, clustering, MapReduce framework, K-means algorithm, co-association matrix, clustering ensemble

摘要： 针对传统聚类算法难以高效进行海量数据聚类分析的问题，提出一种基于MapReduce框架的K-means聚类集成算法。利用K-means算法生成不同聚簇数目的基聚类结果，改进共协关系矩阵，依据数据点对出现次数进行集成，自动得出最终聚类结果。实验结果表明，该算法能够有效地改善聚类质量，具有良好的扩展性，适用于海量数据的聚类分析。

关键词: 海量数据, 聚类, MapReduce框架, K-means算法, 共协关系矩阵, 聚类集成

CLC Number:

TP311

JI Su-qin, SHI Hong-bo. K-means Clustering Ensemble Based on MapReduce[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2013.09.018.

冀素琴，石洪波. 基于MapReduce的K-means聚类集成[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2013.09.018.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2013.09.018

http://www.ecice06.com/EN/Y2013/V39/I9/84

[1]	Yuyan JIANG, Chengfeng TAO, Ping LI. Deep Subspace Clustering Algorithm with Data Augmentation and Adaptive Self-Paced Learning [J]. Computer Engineering, 2023, 49(8): 96-103, 110.
[2]	Meiguang ZHENG, Yong YANG. Personalized Federated Learning Algorithm Based on Mutual Information and Soft Clustering [J]. Computer Engineering, 2023, 49(8): 20-28.
[3]	Zeshui LI, Junzhong JI, Cuicui YANG. Functional Module Detection Based on Deep Network Embedding of Edge Weighing Information in PPIN [J]. Computer Engineering, 2023, 49(8): 69-76.
[4]	Tianchen QIU, Xiaoying ZHENG, Yongxin ZHU, Songlin FENG. Federated Learning Architecture for Non-IID Data [J]. Computer Engineering, 2023, 49(7): 110-117.
[5]	WEI Ya, ZHANG Zhengjun, HE Kailin, TANG Li. Density Peak Clustering Algorithm Based on Relative Density [J]. Computer Engineering, 2023, 49(6): 53-61.
[6]	DAI Haolei, HUANG Yonghui, ZHOU Guoxu. Clustering Analysis Based on Hyper-graph Regularized Non-Negative Tensor Train Decomposition [J]. Computer Engineering, 2023, 49(6): 81-89.
[7]	GAO Xiaofang, YUAN Yuliang, WEN Jing, BAI Xuefei. Label Propagation Algorithm for Intersecting Multi-manifolds Clustering [J]. Computer Engineering, 2023, 49(6): 90-98.
[8]	LI Xiaoteng, ZHANG Panpan, GOU Zhinan, GAO Kai. Multi-Modal Named Entity Recognition Method Based on Multi-Task Learning [J]. Computer Engineering, 2023, 49(4): 114-119.
[9]	ZHANG Sheng, TANG Fan, ZHANG Tianqi, FAN Sen. FCM-SSGP Method for Ultra-Wideband Indoor Localization [J]. Computer Engineering, 2023, 49(3): 211-220.
[10]	CHENG Xiaohui, LI Yu, KANG Yanping. Double Standard Pruning of Convolution Network Based on Feature Extraction of Intermediate Graph [J]. Computer Engineering, 2023, 49(3): 105-112.
[11]	BI Xiang, HUANG Huang, ZHANG Benhong, WEI Xing. V2V Composite Routing Algorithm for Internet of Vehicles Based on Clustering and Improved Q-Learning [J]. Computer Engineering, 2023, 49(3): 221-230,247.
[12]	YUAN Lining, HU Hao, LIU Zhao. Graph Representation Learning Based on Multi-Channel Graph Convolutional Autoencoders [J]. Computer Engineering, 2023, 49(2): 150-160,174.
[13]	HU Huiqi, ZHANG Weiqiang, XU Chen. Discriminant Enhanced Sparse Subspace Clustering [J]. Computer Engineering, 2023, 49(2): 98-104.
[14]	CAI Ruichu, WU Yunjin, CHEN Wei, HAO Zhifeng. Collective Causal Relations Discovery Algorithm for Multivariate Time-Series [J]. Computer Engineering, 2023, 49(2): 127-135.
[15]	LI Linke, KANG Zhao, LONG Bo. Riemannian Manifold Based Multi-View Spectral Clustering Algorithm [J]. Computer Engineering, 2023, 49(1): 113-120,129.

Please choose a citation manager

Content to export

K-means Clustering Ensemble Based on MapReduce

基于MapReduce的K-means聚类集成

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

K-means Clustering Ensemble Based on MapReduce

基于MapReduce的K-means聚类集成

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments