计算机工程 ›› 2009, Vol. 35 ›› Issue (23): 56-58,6.doi: 10.3969/j.issn.1000-3428.2009.23.020

• 软件技术与数据库 • 上一篇    下一篇

基于PrefixSpan的序列模式挖掘改进算法

汪林林1,2,范 军1   

  1. (1. 重庆邮电大学计算机科学与技术学院,重庆400065;2. 重庆工学院,重庆400050)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-12-05 发布日期:2009-12-05

Improved Algorithm for Sequential Pattern Mining Based on PrefixSpan

WANG Lin-lin1,2, FAN Jun1   

  1. (1. College of Computer Science & Technology, Chongqing University of Post & Tele., Chongqing 400065; 2. Chongqing Institute of Technology, Chongqing 400050)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-12-05 Published:2009-12-05

摘要: 针对序列模式挖掘算法PrefixSpan在挖掘过程中需要构造大量投影数据库的不足,提出IPMSP算法,在递归挖掘过程中,通过检查序列数据库关于前缀的前缀,避免对同一频繁前缀模式构造重复投影数据库,同时舍弃对非频繁项的存储并在投影序列数小于最小支持度时停止扫描投影数据库,从而提高PrefixSpan算法的时空性能。实验结果证明,IPMSP算法在时间和空间性能上优于PrefixSpan算法。

关键词: 序列模式, PrefixSpan算法, 投影数据库

Abstract: Aiming at the PrefixSpan algorithm produce huge amount of project databases in mining sequence patterns, this paper proposes an Improved PrefixSpan algorithm for Mining Sequential Patterns(IPMSP) alaorithm. By avoid produce duplicated project databases with the same prefix pattern through checking the prefix with regard to prefix of the sequence database and abnegating the non-frequent items and project databases which sequential number is lower than minimum support in the recursive mining process, the performance of PrefixSpan is well improved. Experiment results shows that the time and space performance of IPMSP algorithm are better than that of PrefixSpan.

Key words: sequential pattern, PrefixSpan algorithm, project database

中图分类号: