计算机工程 ›› 2019, Vol. 45 ›› Issue (3): 32-35,40.doi: 10.19678/j.issn.1000-3428.0049606

所属专题: 云计算与大数据专题

• 云计算与大数据专题 • 上一篇    下一篇

基于负载均衡的并行FP-Growth算法

高权,万晓冬   

  1. 南京航空航天大学 自动化学院,南京 211106
  • 收稿日期:2017-12-07 出版日期:2019-03-15 发布日期:2019-03-15
  • 作者简介:高权(1990—),男,硕士研究生,主研方向为大数据分析、数据挖掘;万晓冬,副研究员
  • 基金项目:

    国家部委基金。

Parallel FP-Growth Algorithm Based on Load Balance

GAO Quan,WAN Xiaodong   

  1. College of Automation Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2017-12-07 Online:2019-03-15 Published:2019-03-15

摘要:

针对FP-Growth算法查找操作时间复杂度较高的问题,提出一种新的算法LBPFP。在PFP算法基础上,将哈希表加入链头表以实现项地址的快速访问,并设计基于前缀长度的计算量模型,优化并行流程,提升算法的执行效率。在webdocs.dat数据库上进行对比实验,结果表明,LBPFP算法比PFP、HPFP、DPFP算法具有更高的频繁项集挖掘效率。

关键词: Spark平台, 频繁模式增长, 并行, 负载均衡, 链头表, 计算量模型

Abstract:

Aiming at the problem that the lookup operation of FP-Growth algorithm has a high time complexity,this paper proposes a new algorithm named LBPFP.The algorithm is based on PFP algorithm,which is added a hash table to the head table to achieve fast access to item and is designed a workload model based on the prefix length to optimize the parallel process and improve the efficiency of the algorithm.The comparison experiments in the webdocs.dat database show that the LBPFP algorithm has better performance than the PFP,HPFP and DPFP algorithms.

Key words: Spark platform, FP-Growth, parallel, load balance, head table, workload model

中图分类号: