作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (21): 76-78. doi: 10.3969/j.issn.1000-3428.2006.21.027

• 软件技术与数据库 • 上一篇    下一篇

一种基于密度的高性能增量聚类算法

刘建晔1,2,李 芳1   

  1. (1. 上海交通大学计算机科学与工程系,上海 200030;2. 甲骨文公司(中国),上海 200021)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2006-11-05 发布日期:2006-11-05

An Efficient Incremental Algorithm for Clustering Based on Density

LIU Jianye1,2, LI Fang1   

  1. (1. Dept. of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030; 2. Oracle China, Shanghai 200021)
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-11-05 Published:2006-11-05

摘要: 提出并证明了一种基于密度的高性能增量聚类算法,算法的主要工作包括:(1)利用分区和抽样技术对数据进行抽取和清理。(2)利用密度和网格技术对数据进行聚类。(3)改变阈值后提出一种增量算法,只对受影响的点重新计算聚类。(4)在动态环境下,数据增删后的增量聚类算法。实验证明,该算法能很好地处理高维数据,有效过滤噪声数据,大大节省聚类时间。

关键词: 数据挖掘, 聚类算法, 密度, 增量算法

Abstract: An incremental algorithm of high efficiency for clustering based on density is presented. The main idea consists of following: (1) Sample data by using partition and sampling technology. (2) Clustering data based on density and grid. (3) In the case for threshold adjusting, it proposes an incremental algorithm to recalculate data affected only. (4) After data insertion or deletion in dynamic environment, making use of incremental algorithm to re-cluster data. The experiments show that the new algorithm can efficiently process high dimensional data with noise and speed up mining greatly.


Key words: Data mining, Clustering algorithm, Density, Incremental algorithm

中图分类号: