作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (20): 71-72. doi: 10.3969/j.issn.1000-3428.2009.20.024

• 软件技术与数据库 • 上一篇    下一篇

集群系统中的FP-Growth并行算法

陈 敏1,李徽翡2   

  1. (1. 北京科技大学经济管理学院,北京 100083;2. 国际商业机器全球服务(中国)有限公司,北京 100027)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-10-20 发布日期:2009-10-20

FP-Growth Parallel Algorithm in Cluster System

CHEN Min1, LI Hui-fei2   

  1. (1. School of Economics and Management, University of Science and Technology Beijing, Beijing 100083; 2. IBM Global Services (China) Company Limited, Beijing 100027)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-10-20 Published:2009-10-20

摘要: 针对FP-Growth算法面临大规模数据库时空效率不高的问题,提出一种面向计算机集群的并行算法。采用投影方法直接寻找频繁项的条件数据库,将挖掘条件数据库的工作分化成若干独立的子任务,分配到集群中的节点上并行实现,由中央节点汇总结果并输出。结果证明,该算法不仅能够提高计算速度,解决数据库规模过大时内存溢出的情况,且具有良好的延展性。

关键词: FP-Growth算法, 计算机集群, 并行算法

Abstract: When the dataset size is huge, both the memory usage and computational cost of FP-Growth algorithm are expensive. This paper proposes a parallel algorithm, which is designed to run on the PC cluster. This algorithm finds all the conditional pattern bases of frequent items by the projection method. It splits the mining task into number of independent sub-tasks, executes these sub-tasks in parallel on nodes and aggregates the sub-results back for the final result. Experiments show that this parallel algorithm not only can accelerate the computational speed, avoids the memory overflow, but also achieves much better scalability than the FP-Growth algorithm.

Key words: FP-Growth algorithm, PC cluster, parallel algorithm

中图分类号: