一种频繁核心项集的快速挖掘算法

doi:10.3969/j.issn.1000-3428.2014.06.026

计算机工程

一种频繁核心项集的快速挖掘算法

田卫东，纪允

(合肥工业大学计算机与信息学院，合肥 230009)

收稿日期:2013-01-28 出版日期:2014-06-15 发布日期:2014-06-13
作者简介:田卫东(1970－)，男，副教授、硕士，主研方向：人工智能，数据挖掘；纪允，硕士研究生。
基金资助:
国家自然科学基金资助项目(60603068)。

A Fast Mining Algorithm for Frequent Essential Itemsets

TIAN Wei-dong, JI Yun

(School of Computer and Information, Hefei University of Technology, Hefei 230009, China)

Received:2013-01-28 Online:2014-06-15 Published:2014-06-13

摘要/Abstract

摘要： 传统的频繁核心项集挖掘需多次生成和反复扫描数据库，导致生成效率低下。为此，提出一种快速生成频繁核心项集算法FMEP。该算法使用Rymon枚举树作为搜索空间，并采用分而治之的策略选择特定的路径进行剪枝。利用频繁核心项集特有的反单调性质，可以快速地判断某一个候选项集是否为频繁核心项集，而无需和所有直接子集的析取支持度进行比较。通过上述方法，可以达到快速挖掘的目的。实验结果证明，该算法能够在挖掘出所有的频繁核心项集精简表示元素的同时，降低消耗时间，与MEP算法相比，在密集型数据集上的时间可缩短2倍以上，在稀疏型数据集上时间至少缩短30%。

关键词: 数据挖掘, 频繁项集, 精简表示, 频繁核心项集, Rymon枚举树

Abstract: Traditional frequent essential itemsets mining requires generating candidate itemsets and scanning database many times, which leads to the lower efficiency generation. Motivated by this, a fast algorithm of mining frequent essential itemsets is proposed. This algorithm uses Rymon enumeration tree as the strategy of space search and divide-and-conquer, meanwhile, it selects particular paths for pruning. It uses frequent essential itemsets unique properties to quickly determine whether a candidate itemset is a frequent essential itemset, without comparing with disjunctive support of all direct subsets. It is beneficial for quick mining. Experimental results show that this algorithm can correctly get all elements of frequent essential itemsets concise representation, and highly reduce the time consumption. It can reduce 2 times in dense datasets while reduce the time consumption in sparse datasets by 30% at least.

Key words: data mining, frequent itemsets, concise representation, frequent essential itemset, Rymon enumeration tree

中图分类号:

TP18

田卫东，纪允. 一种频繁核心项集的快速挖掘算法[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2014.06.026.

TIAN Wei-dong, JI Yun. A Fast Mining Algorithm for Frequent Essential Itemsets[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2014.06.026.

http://www.ecice06.com/CN/Y2014/V40/I6/120

参考文献

参考文献 [1] Han Jiawei, Kamber M. 数据挖掘概念与技术[M]. 范明, 孟小峰, 译. 北京: 机械工业出版社, 2004. [2] 李金凤, 王怀彬. 基于关联规则的网络故障告警相关性分析[J]. 计算机工程, 2012, 38(5): 44-46. [3] Liu Guimei, Li J, Wong L. Positive Borders or Negative Borders: How to Make Lossless Generator Based Represent- ations Concise[C]//Proc. of the 6th SIAM International Conference on Data Mining. [S. 1.]: IEEE Press, 2006: 469- 473. [4] Calders T, Goethals B. Non-derivable Itemset Mining[J]. Data Mining and Knowledge Discovery, 2007, 14(1): 171-206. [5] Pasquier N, Bastide Y, Taouil R. Discovering Frequent Closed Itemsets for Association Rules[C]//Proc. of ICDT’99. [S. 1.]: IEEE Press, 1999: 398-416. [6] 程转流, 胡学钢. 数据流中频繁闭合模式的挖掘[J]. 计算机工程, 2008, 34(16): 50-52. [7] Bykowski A, Rigtti C. A Condensed Representation of Find Frequent Patterns[C]//Proc. of PDOS’01. [S. 1.]: IEEE Press, 2001: 56-63. [8] Kryszkiewicz M. Concise Representation of Frequent Patterns Based on Disjunction-free Generators[C]//Proc. of ICDM’01. [S. 1.]: IEEE Press, 2001: 305-312. [9] Kryszkiewicz M, Gajek M. Concise Representation of Frequent Patterns Based on Generalized Disjunction-free Generators[C]// Proc. of PAKDD’02. [S. 1.]: IEEE Press, 2002: 159-171. [10] Casali A, Cicchetti R, Lakhal L. Essential Patterns: A Perfect Cover of Frequent Patterns[C]//Proc. of the 7th International Conference on Data Warehousing and Knowledge Discovery. Copenhagen, Denmark: Springer-Verlag, 2005: 428-437. [11] Galambos J, Simonelli I. Bonferroni-type Inequalities with Applications[M]. New York, USA: Springer, 2000. [12] Rymon R. Search Through Systematic Set Enumeration[C]// Proc. of the 3rd International Conference on Principles of Knowledge Representation and Reasoning. [S. 1.]: IEEE Press, 1992: 539-550. 编辑索书志

[1]	席荣康, 蔡满春, 芦天亮. 基于数据增强与流数据处理的Tor流量分析模型[J]. 计算机工程, 2023, 49(3): 177-184.
[2]	钱龙, 赵静, 韩京宇, 毛毅. 基于标签相关性的K近邻多标签学习[J]. 计算机工程, 2022, 48(6): 73-78,88.
[3]	谷青竹, 董红斌. PPDM中面向k-匿名的MI Loss评估模型[J]. 计算机工程, 2022, 48(4): 143-147.
[4]	赵欣灿, 朱云, 毛伊敏. 基于MapReduce的高维数据频繁项集挖掘[J]. 计算机工程, 2022, 48(3): 81-89.
[5]	王璐, 刘晓清, 何震瀛. 连续时间区间内的频繁词序列挖掘算法[J]. 计算机工程, 2022, 48(2): 79-85,91.
[6]	张攀, 高丰, 周逸, 饶涵宇, 毛冬, 李静. 一种在线实时微服务调用链异常检测方法[J]. 计算机工程, 2022, 48(11): 161-169.
[7]	吴军, 欧阳艾嘉, 张琳. 面向置换检验的冗余对比模式过滤算法[J]. 计算机工程, 2022, 48(1): 75-84.
[8]	吴军, 欧阳艾嘉, 张琳. 面向对比序列模式发现的独立精确置换检验算法[J]. 计算机工程, 2021, 47(8): 45-53,61.
[9]	杜诗晴, 王鹏, 汪卫. 一种基于MDL的日志序列模式挖掘算法[J]. 计算机工程, 2021, 47(2): 118-125.
[10]	魏文浩, 唐泽坤, 刘刚. 基于距离和密度的PBK-means算法[J]. 计算机工程, 2020, 46(9): 68-75.
[11]	王斌, 房新秀, 魏天佑. 基于差异节点集的加权频繁项集挖掘算法[J]. 计算机工程, 2020, 46(5): 150-156.
[12]	史明阳, 王鹏, 汪卫. 有监督时间序列分割与状态识别算法[J]. 计算机工程, 2020, 46(5): 131-138.
[13]	李洁, 朱洪亮, 陈玉玲, 辛阳. 基于哈希存储与事务加权的并行Apriori改进算法[J]. 计算机工程, 2020, 46(11): 109-116.
[14]	张潘, 卢光跃, 吕少卿, 赵雪莉. 基于矩阵分解的属性网络表示学习[J]. 计算机工程, 2020, 46(10): 67-73.
[15]	王慧健, 刘峥, 李云, 李涛. 基于神经网络语言模型的时间序列趋势预测方法[J]. 计算机工程, 2019, 45(7): 13-19,25.

选择文件类型/文献管理软件名称

选择包含的内容

一种频繁核心项集的快速挖掘算法

A Fast Mining Algorithm for Frequent Essential Itemsets

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

一种频繁核心项集的快速挖掘算法

A Fast Mining Algorithm for Frequent Essential Itemsets

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价