摘要: 为提高聚类算法效率,提出一种基于动态云平台的快速闭树聚类并行算法。针对云计算平台Hadoop中任务的随机分配策略,给出一个满足最小化消耗成本的任务分配算法CDA-GA,并基于该算法提出动态云平台模型。将传统的频繁闭树挖掘算法与聚类算法并行化,应用于动态云平台中,设计基于动态云平台的闭树聚类算法框架。实验结果表明,该算法有效可行,适合在大规模数据下进行聚类分析。
关键词:
数据挖掘,
云计算,
并行计算,
闭树,
树聚类,
海量数据
Abstract: In order to improve the efficiency of clustering algorithm, this paper proposes a model of fast closed tree paralleled algorithm on the platform of dynamic cloud. Aiming at the random allocation strategy of cloud computing platform Hadoop, the paper puts forward CDA-GA to meet the requirements of the minimized consumption cost. Moreover, on the foundation of CDA-GA, it proposes the dynamic cloud platform model. The parallelization of traditional frequency closed tree mining algorithm and clustering algorithm and is applied in the dynamic cloud platform, this paper designs a closed tree clustering algorithm framework. Experimental results show that the algorithm is feasible and fits into clustering analysis under massive amounts of data.
Key words:
data mining,
cloud computing,
parallel computing,
closed tree,
tree clustering,
mass data
中图分类号:
郭鑫,颜一鸣,徐洪智,覃遵跃. 动态云平台下的快速闭树聚类并行算法[J]. 计算机工程.
GUO Xin, YAN Yi-ming, XU Hong-zhi, QIN Zun-yue. Fast Closed Tree Clustering Parallel Algorithm for Dynamic Cloud Platform[J]. Computer Engineering.