摘要: GITC算法和Tree-DM算法都是基于交集关系的挖掘算法。文章分析这2个算法的性能特点,提出一种GITC算法的改进算法:GI算法。该算法利用适当的数据结构来保存支持数信息,省去了扫描原数据库来统计支持数耗费的大量时间,并解决了Tree-DM算法在二次求交、冗余求交等方面存在的问题。经过实验验证,较GITC算法而言,GI算法可以更高效地挖掘用户频繁访问模式。
关键词:
Web日志挖掘,
频繁访问模式,
交集关系
Abstract: The GITC algorithm and the Tree-DM algorithm are both based on the intersection relation. The paper analyzes the performance of both algorithms deeply, and puts forward an improved algorithm named GI. It stores the information of support number in appropriate data structure so as to spare a mass of time of getting the support number of each candidate by scanning the original database. It also solves the problem of getting the intersections repeatedly and redundantly in the Tree-DM algorithm. Experimental results show that the GI algorithm can discover user frequent access patterns more effectively than GITC.
Key words:
Web log mining,
frequent access pattern,
intersection relation
中图分类号:
郭 维. Web日志挖掘中GITC算法的改进[J]. 计算机工程, 2008, 34(4): 60-62.
GUO Wei. Improvement of GITC Algorithm on Web Log Mining[J]. Computer Engineering, 2008, 34(4): 60-62.