计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于协议组降低策略的二次并行k均值聚类算法

沈俊鑫1,郭晓军1,王文浩1,杨旭1,2   

  1. (1.昆明理工大学管理与经济学院,昆明 650093; 2.云南机电职业技术学院工业信息技术系,昆明 650203)
  • 收稿日期:2014-12-01 出版日期:2015-08-15 发布日期:2015-08-15
  • 作者简介:沈俊鑫(1978-),男,副教授、博士,主研方向:数据聚类,复杂网络;郭晓军、王文浩,硕士研究生;杨旭,讲师、硕士。
  • 基金项目:
    国家自然科学基金资助项目(61303234,61263022)。

Secondary Parallel k-means Clustering Algorithm Based on Protocol Group-reduced Strategy

SHEN Junxin  1,GUO Xiaojun  1,WANG Wenhao  1,YANG Xu  1,2   

  1. (1.Faculty of Management and Economics,Kunming University of Science and Technology,Kunming 650093,China; 2.Department of Industrial Information Technology,Yunnan Vocational College of Mechanical & Electrical Technology,Kunming 650203,China)
  • Received:2014-12-01 Online:2015-08-15 Published:2015-08-15

摘要: 针对传统MapReduce框架下多点接口通信时间复杂度过高的问题,提出一种二次并行k均值聚类算法,用于解决大数据聚类问题。采用组降低操作策略,定义组成员管理协议对操作组成员进行管理,通过对组成员的参考列表pID的广播、删除和添加操作,实现基于组降低的同步 操作。定义中间缓冲聚类数量并结合k均值算法,减少二次并行聚类算法组降低操作的输入数据量,进一步降低算法的时间复杂度。在自建大数据测试集上的仿真实验结果表明,该算法能够在保证聚类精度的前提下有效加快算法聚类效率。

关键词: 协议组降低, 并行, k均值聚类算法, 大数据, MapReduce模型

Abstract: In order to solve the problem of high complexity of multipoint interface communication strategies under the framework of traditional MapReduce,a kind of k-means based protocol group-reduced secondary parallel clustering algorithm is proposed.Group membership management protocol is defined to manage the operational group members,through the operation of broadcast,delete and add for the group reference list pID,realizes the group-reduced based synchronous operation,which reduces the time complexity of the algorithm.The number of intermediate buffer clustering is defined and combined with the k-means algorithm to reduce the input data of the secondary parallel clustering algorithm to reduce the amount of group operation,which further reducing the time complexity of the algorithm.The simulation experiment in the test data set show that,in the guarantee of enough clustering precision,the strategy greatly improves the efficiency of the algorithm.

Key words: protocol group-reduced, parallel, k-means clustering algorithm, big data, MapReduce model

中图分类号: