摘要: 针对频繁模式增长算法无法适应数据流的无限性和流动性的特点,提出一种新颖的FP-tree的变形结构-SP-tree,只需单遍扫描便能容纳全部数据库信息。为使SP-tree具有与FP-tree一样良好的压缩性能,给出一种有效的动态重构树的方法,称为宽度排序方法,该方法能够在挖掘过程中动态地逐条分支地重构树,最终产生一棵频繁递减的前缀树。实验结果表明,SP-tree的压缩性能优于其他单遍扫描的前缀树结构。
关键词:
数据流,
频繁模式增长算法,
单遍扫描模式树,
宽度排序方法
Abstract: Aiming at the problem that FP-growth algorithm requires two database scans, which are not consistent with efficient data stream processing, this paper presents a novel tree structure which is a variation of FP-tree, called SP-tree, which captures database information with one scan. For making SP-tree have the same compact performance, it presents an efficient dynamic tree restructuring method, called the breadth sorting method, which restructures a frequency-descending prefix-tree branch-by-branch. Experimental results show that compact performance of the SP-tree is better than other prefix-tree structure with one scan.
Key words:
data stream,
FP-growth algorithm,
single-pass pattern tree,
breadth sorting method
中图分类号:
谭军, 卜英勇, 杨勃. 一种单遍扫描频繁模式树结构[J]. 计算机工程, 2010, 36(14): 32-33.
TAN Jun, BO Yang-Yong, YANG Bo. Single-pass Frequent Pattern Tree Structure[J]. Computer Engineering, 2010, 36(14): 32-33.