作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于Hadoop平台的改进关联规则挖掘算法

王英博,马菁,柴佳佳,赵彬   

  1. (辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105)
  • 收稿日期:2015-10-29 出版日期:2016-10-15 发布日期:2016-10-15
  • 作者简介:王英博(1964—),男,教授、博士,主研方向为数据挖掘、数字化矿山、大数据;马菁、柴佳佳,硕士研究生;赵彬,硕士。

Improved Association Rule Mining Algorithm Based on Hadoop Platform

WANG Yingbo,MA Jing,CHAI Jiajia,ZHAO Bin   

  1. (School of Software,Liaoning Technical University,Huludao,Liaoning 125105,China)
  • Received:2015-10-29 Online:2016-10-15 Published:2016-10-15

摘要: 数据采集方式的增多导致单处理器下的关联规则挖掘受到I/O和内存的限制。针对该问题,对传统挖掘算法进行改进。借助Hadoop平台的优势,通过累加迭代的方法降低算法的时间复杂度,并利用MapReduce编程特点,通过一次遍历和MapReduce任务调度完成频繁项集挖掘,在强关联挖掘中通过Sqoop组件将外部表Hive中的数据迁移到Redis,实现数据的高速读取。实验结果表明,该方法可有效提高挖掘效率,提高幅度随数据集规模同步增大,并且具有较好的加速比和扩展性。

关键词: Hadoop平台, MapReduce编程, 关联规则, 大数据, 数据挖掘

Abstract: Development of ways for data acquisition leads to limit of traditional association rule mining by I/O and memory.Aiming at this problem,this paper puts forward an improved method,which uses advantages of the Hadoop platform,reduces the time complexity of the algorithm by incremental iterative method,and makes full use of the features of MapReduce programming.It completes the frequent itemset mining through traverse and MapReduce task scheduling,which improves the efficiency of processing.In the mining of strong association,with the help of Sqoop,the external tables are migrated from Hive to Redis,which makes the data read more efficient.Experimental results show that the proposed method can improve processing efficiency.With the data increasing,the advance becomes more obvious,and improved algorithm also has better speedup and scalability,which is able to quickly mine the association rules in large data.

Key words: Hadoop platform, MapReduce programming, association rule, big data, data mining

中图分类号: