摘要: 针对Apriori算法产生候选项集的问题,提出一种基于排序索引矩阵(SIM)的频繁项集挖掘算法。将频繁1-项集形成的1-项集向量依次与对应矩阵相乘,生成频繁2-项集。从频繁3-项集开始,对每次生成的频繁k-项集建立SIM,借助SIM结构实现项集的跨越式搜索和连接。整个过程只需扫描一次数据库,不会产生候选项集。实验结果表明,该算法能提高频繁项集的挖掘效率。
关键词:
关联规则,
排序索引矩阵,
候选项集,
频繁项集,
跨越式搜索,
数据挖掘
Abstract: Aiming at the problem that Apriori algorithm generates candidate itemsets, this paper presents a frequent itemsets mining algorithm based on Sorting Index Matrix(SIM). The algorithm directly generates frequent 2-itemset through 1-itemset vector and the corresponding matrix multiplication sequentially. From the frequent 3-itemset, it establishes simple SIM for the frequent k-item sets to realize itemsets’ spanning search and connection with the SIM. The whole process just scans the database once, and does not produce candidate itemsets. Experimental result shows that the algorithm improves the efficiency of mining frequent itemsets.
Key words:
association rule,
Sorting Index Matrix(SIM),
candidate itemsets,
frequent itemsets,
spanning search,
data mining
中图分类号:
荀娇, 徐连诚, 杨仁华. 基于排序索引矩阵的频繁项集挖掘算法[J]. 计算机工程, 2012, 38(19): 41-44,48.
XUN Jiao, XU Lian-Cheng, YANG Ren-Hua. Frequent Itemsets Mining Algorithm Based on Sorting Index Matrix[J]. Computer Engineering, 2012, 38(19): 41-44,48.