作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于双层网格和密度的数据流聚类算法

王治和,杨 晏   

  1. (西北师范大学计算机科学与工程学院,兰州 730070)
  • 收稿日期:2013-09-12 出版日期:2014-04-15 发布日期:2014-04-14
  • 作者简介:王治和(1965-),男,教授,主研方向:数据库技术,数据挖掘;杨 晏,硕士研究生。
  • 基金资助:
    国家自然科学基金资助项目“西部国家重点生态功能区生态安全预警和仿真调控研究”(71263045)。

Data Stream Clustering Algorithm Based on Double-layer Grid and Density

WANG Zhi-he, YANG Yan   

  1. (College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China)
  • Received:2013-09-12 Online:2014-04-15 Published:2014-04-14

摘要: 传统的基于网格的数据流聚类算法在同一粒度的网格上进行聚类,虽然提高了处理速度,但聚类准确性较低。针对此问题,提出一种新的基于双层网格和密度的数据流聚类算法DBG-Stream。在2种粒度的网格上对数据流进行聚类,并借鉴CluStream算法的思想,将聚类过程分为2个阶段。在线过程中利用粗粒度的网格单元形成初始聚类,离线过程中在细粒度网格单元上,对位于簇边界的网格单元进行二次聚类以提高聚类精度,并实现了关键参数的自动设置,通过删格策略提高算法效率。实验结果表明,DBG-Stream算法的聚类精确度较D-Stream算法有较大提高,有效解决了传统基于网格聚类算法的聚类精度较低的问题。

关键词: 数据挖掘, 数据流, 聚类, 聚类分析, 密度, 双层网格

Abstract: Traditional data stream clustering algorithm is based on grid clusters at the grid of same granularity, and it improves processing speed, but the accuracy of cluster is lower. In this connection, a new data stream clustering algorithm DBG-Stream based on double-layer grid and density is put forward. The algorithm uses grids of two different granularity to cluster data stream, and by learning the idea of CluStream algorithm, it divides the clustering process into two stages. The first one is that applying coarse-grained grid cells to form the initial cluster in the online process, and the second one is that on the fine-grained grid cells, making secondary clustering for grid cell located on the boundary cluster in the offline process so as to improve the accuracy of cluster. At the same time, it enables the automatic setting of key parameters. Besides, it improves the efficiency of the algorithm by the strategy of deleting grid. Experimental results show that the DBG-Stream algorithm clustering accuracy greatly improves compared with D-Stream algorithm, and it effectively solves the problem of traditional grid-based clustering algorithm.

Key words: data mining, data stream, clustering, clustering analysis, density, double-layer grid

中图分类号: