Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2007, Vol. 33 ›› Issue (17): 80-82. doi: 10.3969/j.issn.1000-3428.2007.17.028

• Software Technology and Database • Previous Articles     Next Articles

WebLog Access Sequential Pattern Mining Based on SPAM-FTP

ZHU Li1, YING Ji-kang1, BU Zhong-fei2   

  1. (1. Computing Center of Information Institute, East China Normal University, Shanghai 200062; 2. Education Department of Yangzhou,Yangzhou 225002)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-05 Published:2007-09-05

基于SPAM-FPT的WebLog访问序列模式挖掘

朱 莉1,应吉康1,卜忠飞2   

  1. (1. 华东师范大学信息学院计算中心,上海 200062;2. 扬州市教育局,扬州 225002)

Abstract:

WebLog mining is application of sequential pattern mining of data mining technology on Web server log files. Sequential patterns mined from Web logs are used to improve the quality of information service on Web. The main challenge of mining access sequential pattern form WebLog is the high processing cost due to the large amount of data. By combining SPAM and PrefixSpan, this paper proposes a new arithmetic SPAM-FPT. By constructing first_positon_table, SPAM-fPT avoids “joining” or “ANDing” in SPAM and generating a large number of projected database in PrefixSpan, and gets all the frequent sequential patterns form dtatabase.

Key words: sequential pattern mining, WebLog, frequent subsequence, SPAM-FPT

摘要:

WebLog访问序列模式挖掘将数据挖掘中的序列模式技术应用于Web服务器上的日志文件,以此来改善Web的信息服务,而在对海量的数据挖掘时,系统资源开销很大。该文结合SPAM、PrefixSpan的思想,提出一个新的算法——SPAM-FPT,该算法通过建立First_Positon_Table,避免了SPAM中的“与操作”、“连接操作”以及PrefixSpan中大量的“投影数据库”的建立,可以快捷地挖掘数据库中所有“频繁子序列”。

关键词: 序列模式挖掘, WebLog, 频繁子序列, SPAM-FPT

CLC Number: