作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (19): 108-110,. doi: 10.3969/j.issn.1000-3428.2006.19.039

• 软件技术与数据库 • 上一篇    下一篇

基于压缩结构树的XML数据频繁模式挖掘研究

曹洪其1,牛天耘2,孙志挥2   

  1. (1. 南通职业大学电子工程系,南通 226007;2. 东南大学计算机科学与工程系,南京 210096)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-10-05 发布日期:2006-10-05

Research of Frequent Pattern Mining from XML Data Based on Compressed Structure Tree

CAO Hongqi1, NIU Tianyun2, SUN Zhihui2   

  1. (1. Department of Electronic Engineering, Nantong Vocational College, Nantong 226007;
    2. Department of Computer Science and Engineering, Southeast University, Nanjing 210096)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-05 Published:2006-10-05

摘要: XML文档频繁模式挖掘是XML相关研究工作中的重要内容。在现有的频繁树结构挖掘算法WL的基础上,提出了一种高效的基于压缩结构树存储结构的XML数据频繁模式挖掘算法AFPMX_CST。该算法压缩了搜索空间,减少了扫描次数,相对于WL算法在时间效率和空间效率方面具有更加良好的性能。同时,该文进一步研究了将挖掘结果转换为相应的DTD格式的方法及过程。实验结果表明AFPMX_CST算法是可行和有效的。

关键词: XML, 数据挖掘, 频繁模式, 算法, DTD

Abstract: Frequent pattern mining based on XML document is an important content in XML-related research. An efficient algorithm called AFPMX_CST is presented to discover frequent pattern in XML data based on compressed structure tree of storing XML data, with the existing frequent tree structure mining algorithm WL. It compresses the searching space, reduces scanning times, so it is much better than WL in time efficiency and space efficiency. At the same time, the methods and process to change mining results into corresponding DTD patterns are researched. It is proved both in theory and practice that this algorithm is adoptable and effective.

Key words: XML, Data mining, Frequent pattern, Algorithm, DTD

中图分类号: