摘要: 研究从Web日志中快速挖掘出可重复连续频繁访问路径的方法。针对现有挖掘算法存在的一些问题,将矩阵应用于挖掘过程中,给出CA矩阵的概念,并利用该矩阵来挖掘可重复连续挖掘频繁访问路径,从而无需多次扫描数据库,避免产生庞大的中间项,从一定程度上简化了挖掘过程。实验表明该算法的准确性和高效性。
关键词:
Web日志,
连续频繁访问路径,
CA矩阵,
中间项
Abstract: This paper researches the method of discovering Repeated Continuous Frequent Access(RCFA) paths from Web logs. For some problems of existing mining algorithms, matrix is applied to the process of mining in this paper, it proposes conception of CA matrix and discovers repeated continuous frequent access paths by the CA matrix, thereby it has no need to scan database repeatedly and avoids a great deal of intermediate items. To some extent, it simplifies the process of mining. Experiment indicates the accuracy and high efficiency of the algorithm.
Key words:
Web log,
continiuous frequent access path,
Continuous Access(CA) matrix,
intermediate item
中图分类号:
许晓东, 李柯, 朱士瑞. Web日志中RCFA路径的挖掘方法[J]. 计算机工程, 2010, 36(22): 55-57.
HU Xiao-Dong, LI Ke, SHU Shi-Rui. Mining Method of RCFA Paths from Web Logs[J]. Computer Engineering, 2010, 36(22): 55-57.