Abstract:
In order to solve the problem of high complexity of multipoint interface communication strategies under the framework of traditional MapReduce,a kind of k-means based protocol group-reduced secondary parallel clustering algorithm is proposed.Group membership management protocol is defined to manage the operational group members,through the operation of broadcast,delete and add for the group reference list pID,realizes the group-reduced based synchronous operation,which reduces the time complexity of the algorithm.The number of intermediate buffer clustering is defined and combined with the k-means algorithm to reduce the input data of the secondary parallel clustering algorithm to reduce the amount of group operation,which further reducing the time complexity of the algorithm.The simulation experiment in the test data set show that,in the guarantee of enough clustering precision,the strategy greatly improves the
efficiency of the algorithm.
Key words:
protocol group-reduced,
parallel,
k-means clustering algorithm,
big data,
MapReduce model
摘要: 针对传统MapReduce框架下多点接口通信时间复杂度过高的问题,提出一种二次并行k均值聚类算法,用于解决大数据聚类问题。采用组降低操作策略,定义组成员管理协议对操作组成员进行管理,通过对组成员的参考列表pID的广播、删除和添加操作,实现基于组降低的同步
操作。定义中间缓冲聚类数量并结合k均值算法,减少二次并行聚类算法组降低操作的输入数据量,进一步降低算法的时间复杂度。在自建大数据测试集上的仿真实验结果表明,该算法能够在保证聚类精度的前提下有效加快算法聚类效率。
关键词:
协议组降低,
并行,
k均值聚类算法,
大数据,
MapReduce模型
CLC Number:
SHEN Junxin,GUO Xiaojun,WANG Wenhao,YANG Xu. Secondary Parallel k-means Clustering Algorithm Based on Protocol Group-reduced Strategy[J]. Computer Engineering.
沈俊鑫,郭晓军,王文浩,杨旭. 基于协议组降低策略的二次并行k均值聚类算法[J]. 计算机工程.