Abstract:
This paper proposes a unified coded frequent pattern-tree (CFP-tree) structure to store both 1-dimensional and multidimensional sequence data. The proposed algorithm finds frequent sequential patterns through progressive prefix sequence search and avoids recursively to generate a great deal of intermediate subsequences. Experiments show great performance gains over existing sequential pattern mining algorithms, especially for large database.
Key words:
Data mining,
Sequential pattern,
Multi-dimensional sequence
摘要: 提出了同时适用于一维和多维序列数据的统一存储结构——编码频繁模式树(CFP-tree),并通过渐进的前缀序列搜索方式来发现频繁序列模式,避免了在挖掘过程中递归地产生大量的中间子序列。实验证明,该算法在大规模数据的处理上比现有序列模式挖掘算法有更好的性能。
关键词:
数据挖掘,
序列模式,
多维度序列
CLC Number:
XU Chunyan. Sequential Patterns Mining Algorithm Based on Coded Frequent Pattern-tree[J]. Computer Engineering, 2007, 33(06): 65-68.
胥春艳. 基于编码频繁模式树的序列模式挖掘算法[J]. 计算机工程, 2007, 33(06): 65-68.