Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2009, Vol. 35 ›› Issue (20): 71-72. doi: 10.3969/j.issn.1000-3428.2009.20.024

• Software Technology and Database • Previous Articles     Next Articles

FP-Growth Parallel Algorithm in Cluster System

CHEN Min1, LI Hui-fei2   

  1. (1. School of Economics and Management, University of Science and Technology Beijing, Beijing 100083; 2. IBM Global Services (China) Company Limited, Beijing 100027)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-10-20 Published:2009-10-20

集群系统中的FP-Growth并行算法

陈 敏1,李徽翡2   

  1. (1. 北京科技大学经济管理学院,北京 100083;2. 国际商业机器全球服务(中国)有限公司,北京 100027)

Abstract: When the dataset size is huge, both the memory usage and computational cost of FP-Growth algorithm are expensive. This paper proposes a parallel algorithm, which is designed to run on the PC cluster. This algorithm finds all the conditional pattern bases of frequent items by the projection method. It splits the mining task into number of independent sub-tasks, executes these sub-tasks in parallel on nodes and aggregates the sub-results back for the final result. Experiments show that this parallel algorithm not only can accelerate the computational speed, avoids the memory overflow, but also achieves much better scalability than the FP-Growth algorithm.

Key words: FP-Growth algorithm, PC cluster, parallel algorithm

摘要: 针对FP-Growth算法面临大规模数据库时空效率不高的问题,提出一种面向计算机集群的并行算法。采用投影方法直接寻找频繁项的条件数据库,将挖掘条件数据库的工作分化成若干独立的子任务,分配到集群中的节点上并行实现,由中央节点汇总结果并输出。结果证明,该算法不仅能够提高计算速度,解决数据库规模过大时内存溢出的情况,且具有良好的延展性。

关键词: FP-Growth算法, 计算机集群, 并行算法

CLC Number: