摘要: XML文档频繁模式挖掘是XML相关研究工作中的重要内容。在现有的频繁树结构挖掘算法WL的基础上,提出了一种高效的基于压缩结构树存储结构的XML数据频繁模式挖掘算法AFPMX_CST。该算法压缩了搜索空间,减少了扫描次数,相对于WL算法在时间效率和空间效率方面具有更加良好的性能。同时,该文进一步研究了将挖掘结果转换为相应的DTD格式的方法及过程。实验结果表明AFPMX_CST算法是可行和有效的。
关键词:
XML,
数据挖掘,
频繁模式,
算法,
DTD
Abstract: Frequent pattern mining based on XML document is an important content in XML-related research. An efficient algorithm called AFPMX_CST is presented to discover frequent pattern in XML data based on compressed structure tree of storing XML data, with the existing frequent tree structure mining algorithm WL. It compresses the searching space, reduces scanning times, so it is much better than WL in time efficiency and space efficiency. At the same time, the methods and process to change mining results into corresponding DTD patterns are researched. It is proved both in theory and practice that this algorithm is adoptable and effective.
Key words:
XML,
Data mining,
Frequent pattern,
Algorithm,
DTD
中图分类号:
曹洪其;牛天耘;孙志挥. 基于压缩结构树的XML数据频繁模式挖掘研究[J]. 计算机工程, 2006, 32(19): 108-110,.
CAO Hongqi; NIU Tianyun; SUN Zhihui. Research of Frequent Pattern Mining from XML Data Based on Compressed Structure Tree
[J]. Computer Engineering, 2006, 32(19): 108-110,.