摘要: 挖掘密集型数据集的全部频繁项集代价高昂,针对该问题,提出一种数据结构链表数组和基于链表数组的最大频繁项集快速生成算法。该方法使用链表数组为每个项目建立事务链表,并且链表的创建过程只需扫描数据库1次。使用深度优先搜索得到所有候选最大频繁项集,利用约束条件缩小搜索空间。使用标准数据集进行验证测试并与其他算法进行比较,实验结果表明,该算法具有较快的挖掘速度。
关键词:
数据挖掘,
最大频繁项集,
链表数组,
解空间
Abstract: Mining all frequent itemsets in dense datasets is very expensive. Aiming at this problem, linked list array, a new data structure, and a fast method of Mining Frequent Itemsets(MFI) based on it are proposed. This method creates linked list array for each item, only needs scan database one time, uses depth-first search strategy to generate all MFI. The algorithm reduces search space by using constraint condition. It demonstrates the algorithm with standard dataset, and the experimental results confirm that the mining algorithm can significantly improve the speed of mining MFI compared with other algorithms.
Key words:
data mining,
Maximal Frequent Itemsets(MFI),
linked list array,
solution space
中图分类号:
刘应东;冷明伟;陈晓云. 基于链表数组的最大频繁项集挖掘算法[J]. 计算机工程, 2010, 36(06): 89-90.
LIU Ying-dong; LENG Ming-wei; CHEN Xiao-yun. Maximal Frequent Itemsets Mining Algorithm Based on Linked List Array[J]. Computer Engineering, 2010, 36(06): 89-90.