Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2010, Vol. 36 ›› Issue (13): 87-89,92. doi: 10.3969/j.issn.1000-3428.2010.13.031

• Networks and Communications • Previous Articles     Next Articles

XML Data Stream Clustering Algorithm Based on Sliding Window

YAO Wen-ji, GAO Ming-xia, MAO Guo-jun, LI Guang-kui   

  1. (School of Computer, Beijing University of Technology, Beijing 100124)
  • Online:2010-07-05 Published:2010-07-05

基于滑动窗口的XML数据流聚类算法

姚文集,高明霞,毛国君,李广奎   

  1. (北京工业大学计算机学院,北京 100124)
  • 作者简介:姚文集(1984-),男,硕士研究生,主研方向:数据挖掘,知识发现;高明霞,讲师;毛国君,教授;李广奎,硕士研究生
  • 基金资助:
    国家自然科学基金资助项目“分布式数据流的集成模式挖掘模型和概念漂移检测算法研究”(60496322);北京工业大学博士启动基金资助项目(X0007011200901)

Abstract: This paper proposes a XML data stream clustering algorithm SW-XSCLS, based on sliding window, in the view of the XML data stream clustering research. The algorithm uses the sliding window technology, takes Exponential Histogram of Clustering Feature(EHCF) as its summary of data structure, it can dynamicly eliminates the outdated data, better preservation of the data distribution in current window, so can obtain a higher quality of clustering results. Theoretical analysis and experimental result show that the algorithm can obtain the higher clustering quality and the quicker processing speed.

Key words: XML data stream, sliding window, clustering, exponential histogram

摘要: 通过对XML数据流的聚类研究,提出一种基于滑动窗口的XML数据流聚类算法SW-XSCLS。该算法采用滑动窗口技术,以聚类特征指数直方图作为概要数据结构,能动态地淘汰“过时”的数据,较好地保存当前窗口内的数据分布状况,从而获取较高质量的聚类结果。理论分析和实验结果表明,该算法可以获得较高的聚类质量和较快的处理速度。

关键词: XML数据流, 滑动窗口, 聚类, 指数直方图

CLC Number: