作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

一种频繁核心项集的快速挖掘算法

田卫东,纪 允   

  1. (合肥工业大学计算机与信息学院,合肥 230009)
  • 收稿日期:2013-01-28 出版日期:2014-06-15 发布日期:2014-06-13
  • 作者简介:田卫东(1970-),男,副教授、硕士,主研方向:人工智能,数据挖掘;纪 允,硕士研究生。
  • 基金资助:
    国家自然科学基金资助项目(60603068)。

A Fast Mining Algorithm for Frequent Essential Itemsets

TIAN Wei-dong, JI Yun   

  1. (School of Computer and Information, Hefei University of Technology, Hefei 230009, China)
  • Received:2013-01-28 Online:2014-06-15 Published:2014-06-13

摘要: 传统的频繁核心项集挖掘需多次生成和反复扫描数据库,导致生成效率低下。为此,提出一种快速生成频繁核心项集算法FMEP。该算法使用Rymon枚举树作为搜索空间,并采用分而治之的策略选择特定的路径进行剪枝。利用频繁核心项集特有的反单调性质,可以快速地判断某一个候选项集是否为频繁核心项集,而无需和所有直接子集的析取支持度进行比较。通过上述方法,可以达到快速挖掘的目的。实验结果证明,该算法能够在挖掘出所有的频繁核心项集精简表示元素的同时,降低消耗时间,与MEP算法相比,在密集型数据集上的时间可缩短2倍以上,在稀疏型数据集上时间至少缩短30%。

关键词: 数据挖掘, 频繁项集, 精简表示, 频繁核心项集, Rymon枚举树

Abstract: Traditional frequent essential itemsets mining requires generating candidate itemsets and scanning database many times, which leads to the lower efficiency generation. Motivated by this, a fast algorithm of mining frequent essential itemsets is proposed. This algorithm uses Rymon enumeration tree as the strategy of space search and divide-and-conquer, meanwhile, it selects particular paths for pruning. It uses frequent essential itemsets unique properties to quickly determine whether a candidate itemset is a frequent essential itemset, without comparing with disjunctive support of all direct subsets. It is beneficial for quick mining. Experimental results show that this algorithm can correctly get all elements of frequent essential itemsets concise representation, and highly reduce the time consumption. It can reduce 2 times in dense datasets while reduce the time consumption in sparse datasets by 30% at least.

Key words: data mining, frequent itemsets, concise representation, frequent essential itemset, Rymon enumeration tree

中图分类号: