摘要: 为解决共享存储的并行计算环境下挖掘序列模式时存在的处理器负载不平衡及缺少有效剪枝策略的问题,提出采用动态任务分配的办法来平衡处理器之间的工作负载,利用并行局部剪枝技术消除投影数据库的重复生成与计算以提高挖掘效率。设计一种基于共享存储SMP系统的并行序列模式挖掘算法PFSPAN。算法分析和实验结果表明,PFSPAN能够有效地挖掘序列模式。
关键词:
数据挖掘,
序列模式,
并行处理,
任务分配,
局部剪枝
Abstract: Under the parallel computer environment with shared memory, imbalance of processors’ workload and lack of effective pruning methods are two key problems in mining sequential patterns. To solve these problems, dynamic tasks distribution method for achieving balance of workload among processors and the parallel local pruning technology used to improve the mining efficiency by avoiding abundant duplicated projected databases are proposed. A parallel algorithm for mining sequential patterns using these two methods based on Symmetric MultiProcessor(SMP) computer system, Parallel Fast Sequential Pattern mining algorithm(PFSPAN) is proposed in this paper. Both theoretical analyses and practical experiments show that PFSPAN can mine sequential patterns effectively.
Key words:
data mining,
sequential patterns,
parallel disposal,
task distribution,
local pruning
中图分类号:
田卫东;姜海辉. 一种有效的并行序列模式挖掘算法[J]. 计算机工程, 2009, 35(18): 59-61.
TIAN Wei-dong; JIANG Hai-hui. Effective Mining Algorithm for Parallel Sequential Patterns[J]. Computer Engineering, 2009, 35(18): 59-61.