摘要: 为解决传统关联规则挖掘算法在大数据环境下运行效率较低的问题,基于频繁模式增长(FP-growth)算法,提出一种面向大数据的并行关联规则增量更新算法。利用MapReduce编程模型与云计算平台,对FP-growth算法各步骤进行并行化处理。在增量更新挖掘过程中,使用已有
的频繁项集和1-项集对新增事务集构建频繁模式树,通过扫描原始事务数据库完成频繁项集的更新。实验结果表明,与传统关联规则挖掘算法相比,该算法具有更高的挖掘效率和扩展性,适用于海量数据的关联规则增量挖掘。
关键词:
大数据,
云计算,
MapReduce编程模型,
频繁项集,
增量更新,
关联规则
Abstract: Under the environment of big data,the traditional association rule mining algorithms have lower efficiency caused by the rapidly increasing data.Aiming at the problem,this paper proposes a parallel incremental updating algorithm of association rules based on the Frequent Pattern Growth(FP-growth) algorithm.Each step of incremental FP-growth algorithm is realized to parallel process by using the MapReduce programming model and cloud computing platform.In the updating process,it uses the existing incremental of frequent itemsets and 1-set to construct frequent pattern tree of the new transaction after completing frequent itemsets updating by scanning the original transaction database one time.Experimental results show that the algorithm has better efficiency and expansibility compared with the traditional association rule mining algorithm,therefore it can be applied to the association rules incremental mining of massive data.
Key words:
big data,
cloud computing,
MapReduce programming model,
frequent itemset,
incremental updating,
association rule
中图分类号: