作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于Nodeset的最大频繁项集挖掘算法

林晨,顾君忠   

  1. (华东师范大学 计算机科学技术系,上海 200241)
  • 收稿日期:2015-11-13 出版日期:2016-12-15 发布日期:2016-12-15
  • 作者简介:林晨(1990—),男,硕士研究生,主研方向为数据挖掘、情景感知计算;顾君忠,教授、博士生导师。
  • 基金资助:
    上海市国际科技合作项目(13430710100);上海市科委科技创新行动计划项目(13511506201)。

Maximal Frequent Itemset Mining Algorithm Based on Nodeset

LIN Chen,GU Junzhong   

  1. (Department of Computer Science and Technology,East China Normal University,Shanghai 200241,China)
  • Received:2015-11-13 Online:2016-12-15 Published:2016-12-15

摘要: 递归遍历、条件FP-Tree构建与超集检测是多数基于FP-Tree最大频繁项集挖掘算法的主要性能瓶颈。为此,提出一种基于Nodeset的最大频繁项集挖掘算法——MFIN算法。该算法采用Nodeset数据结构对POC-Tree的节点编码,将集合枚举树作为搜索空间,避免递归遍历和条件FP-Tree构建的时间开销。设计提前停止方法提高求解Nodeset交集的效率,采用父等价剪枝技术和前瞻剪枝技术缩小搜索空间。对基于MFI-Tree的投影策略进行改进,提升超集检测的速度。实验结果表明,MFIN算法在mushroom,pumsb,webdocs数据集上的运行时间及执行效率等总体性能明显优于基于FP-Tree的FP-Max算法。

关键词: 最大频繁项集, 关联规则, 剪枝技术, 前缀树, 超集检测

Abstract: The major performance bottlenecks of most maximal frequent itemset mining algorithms based on FP-Tree are caused by recursively traversing and constructing conditional FP-Trees and superset check.Therefore,this paper proposes a maximal frequent itemset mining algorithm based on Nodeset named MFIN.Instead of recursively traversing and constructing conditional FP-Trees,this algorithm adopts a novel data structure called Nodeset to encode the nodes of POC-Tree and uses a set-enumeration tree to represent search space.Besides,this paper proposes an early-stopping method to minimize the cost of intersection operation of Nodesets and adopts parent equivalence pruning technique and look-ahead pruning technique to reduce the search space.The efficiency of superset check method is promoted by improving the projection strategy based on MFI-Tree.Experimental results show that the performance of MFIN algorithm is superior to the classic FP-Max algorithm which is based on FP-Tree in running time and efficiency on mushroom,pumsb and webdocs datasets.

Key words: maximal frequent itemset, association rule, pruning technique, prefix tree, superset check

中图分类号: