作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (22): 55-57. doi: 10.3969/j.issn.1000-3428.2010.22.019

• 软件技术与数据库 • 上一篇    下一篇

Web日志中RCFA路径的挖掘方法

许晓东1,2,李 柯2,朱士瑞2   

  1. (1. 南京理工大学计算机科学与技术学院,南京 210094;2. 江苏大学网络中心,江苏 镇江 212013)
  • 出版日期:2010-11-20 发布日期:2010-11-18
  • 作者简介:许晓东(1965-),男,副教授,主研方向:Web挖掘,网络管理,系统集成;李 柯、朱士瑞,硕士研究生
  • 基金资助:
    江苏省教育厅高校科学研究基金资助项目(03KJD520073

Mining Method of RCFA Paths from Web Logs

XU Xiao-dong1,2, LI Ke2, ZHU Shi-rui2   

  1. (1. School of Computer Science and Technology, Nanjing University of Science and Techology, Nanjing 210094, China; 2. Network Center, Jiangsu University, Zhenjiang 212013, China)
  • Online:2010-11-20 Published:2010-11-18

摘要: 研究从Web日志中快速挖掘出可重复连续频繁访问路径的方法。针对现有挖掘算法存在的一些问题,将矩阵应用于挖掘过程中,给出CA矩阵的概念,并利用该矩阵来挖掘可重复连续挖掘频繁访问路径,从而无需多次扫描数据库,避免产生庞大的中间项,从一定程度上简化了挖掘过程。实验表明该算法的准确性和高效性。

关键词: Web日志, 连续频繁访问路径, CA矩阵, 中间项

Abstract: This paper researches the method of discovering Repeated Continuous Frequent Access(RCFA) paths from Web logs. For some problems of existing mining algorithms, matrix is applied to the process of mining in this paper, it proposes conception of CA matrix and discovers repeated continuous frequent access paths by the CA matrix, thereby it has no need to scan database repeatedly and avoids a great deal of intermediate items. To some extent, it simplifies the process of mining. Experiment indicates the accuracy and high efficiency of the algorithm.

Key words: Web log, continiuous frequent access path, Continuous Access(CA) matrix, intermediate item

中图分类号: