计算机工程

所属专题: 云计算专题

• 云计算专题 • 上一篇    下一篇

基于MapReduce的并行关联规则增量更新算法

程广,王晓峰   

  1. (上海海事大学信息工程学院,上海 201306)
  • 收稿日期:2015-02-12 出版日期:2016-02-15 发布日期:2016-01-29
  • 作者简介:程广(1991-),男,硕士研究生,主研方向为云计算、数据挖掘;王晓峰,教授、博士生导师。

Incremental Updating Algorithm of Parallel Association Rule Based on MapReduce

CHENG Guang,WANG Xiaofeng   

  1. (College of Information Engineering,Shanghai Maritime University,Shanghai 201306,China)
  • Received:2015-02-12 Online:2016-02-15 Published:2016-01-29

摘要: 为解决传统关联规则挖掘算法在大数据环境下运行效率较低的问题,基于频繁模式增长(FP-growth)算法,提出一种面向大数据的并行关联规则增量更新算法。利用MapReduce编程模型与云计算平台,对FP-growth算法各步骤进行并行化处理。在增量更新挖掘过程中,使用已有 的频繁项集和1-项集对新增事务集构建频繁模式树,通过扫描原始事务数据库完成频繁项集的更新。实验结果表明,与传统关联规则挖掘算法相比,该算法具有更高的挖掘效率和扩展性,适用于海量数据的关联规则增量挖掘。

关键词: 大数据, 云计算, MapReduce编程模型, 频繁项集, 增量更新, 关联规则

Abstract: Under the environment of big data,the traditional association rule mining algorithms have lower efficiency caused by the rapidly increasing data.Aiming at the problem,this paper proposes a parallel incremental updating algorithm of association rules based on the Frequent Pattern Growth(FP-growth) algorithm.Each step of incremental FP-growth algorithm is realized to parallel process by using the MapReduce programming model and cloud computing platform.In the updating process,it uses the existing incremental of frequent itemsets and 1-set to construct frequent pattern tree of the new transaction after completing frequent itemsets updating by scanning the original transaction database one time.Experimental results show that the algorithm has better efficiency and expansibility compared with the traditional association rule mining algorithm,therefore it can be applied to the association rules incremental mining of massive data.

Key words: big data, cloud computing, MapReduce programming model, frequent itemset, incremental updating, association rule

中图分类号: