摘要:
为解决传统最大频繁模式在项集频繁度与项集长度规模之间的制约关系,提出最大亚频繁模式概念及其挖掘算法MSFP-mining,包括最大亚频繁模式概念并分析其要素特点,基于AFP-tree、CMP-tree、SFP-tree、SFP-growth的候选MSFP挖掘方法,基于MSFP-tree的最大亚频繁模式超集检测和剪枝策略及对MSFP-mining挖掘性能的实验验证。实验结果表明,该算法利用差别频繁度实现核心项集、附加频繁项集、补充频繁项集的阶段性求取和组合,在保证项集频繁度基础上实现最大亚频繁模式挖掘,扩展频繁模式规模。
关键词:
模式挖掘,
最大亚频繁模式,
数据集,
超集检测,
MSFP-tree结构
Abstract:
To solve the problem of traditional maximal frequent pattern mining that it can not find frequent pattern remaining more items than traditional maximal frequent pattern with the same support threshold, this paper proposes the conception of Maximal Sub-Frequent Pattern(MSFP) and relative mining algorithm MSFP-mining. The main contributions include: the conception of MSFP and analysis of MSFP character, the MSFP-mining algorithms of MSFP, such as AFP-tree, CMP-tree, SFP-tree, SFP-growth, and MSFP-tree, the superset check method of candidate MSFP and the pruning strategy of MSFP-tree, the efficiency of MSFP-tree based mining algorithms by extensive experiments. Experimental result shows that MSFP can effectively expand the scale of maximal frequent pattern.
Key words:
pattern mining,
Maximal Sub-Frequent Pattern(MSFP),
data set,
superset check,
MSFP-tree structure
中图分类号:
张海清, 刘胤田. 最大亚频繁模式挖掘算法研究[J]. 计算机工程, 2011, 37(14): 62-64.
ZHANG Hai-Qing, LIU Yin-Tian. Research on Mining Algorithm of Maximal Sub-Frequent Pattern[J]. Computer Engineering, 2011, 37(14): 62-64.