作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于变化密度的自适应空间聚类方法研究

杨亚军,张坤龙,杨晓科   

  1. (天津大学计算机科学与技术学院,天津 300072)
  • 收稿日期:2013-07-01 出版日期:2014-08-15 发布日期:2014-08-15
  • 作者简介:杨亚军(1988-),男,硕士,主研方向:数据库技术,数据挖掘;张坤龙,副教授;杨晓科,硕士。
  • 基金资助:
    国家自然科学基金资助项目(11003027)。

Research on Self-adaptive Spatial Clustering Method Based on Varied Density

YANG Ya-jun,ZHANG Kun-long,YANG Xiao-ke   

  1. (School of Computer Science and Technology,Tianjin University,Tianjin 300072,China)
  • Received:2013-07-01 Online:2014-08-15 Published:2014-08-15

摘要: 针对DBSCAN算法无法处理变化密度的问题,提出一种基于变化密度的自适应空间聚类方法。采用密度变化率来识别不同密度的簇之间的边界,且运行时自动调整参数的值。将密度定义为一个点到其第k个最近邻居的距离,若一个点的邻居的密度与该点密度的变化率小于用户给定阈值,则为相似邻居。定义核点为最邻近邻居中至少有k个是相似邻居的点,在此基础上应用DBSCAN算法进行广度优先搜索,将密度相似并且距离可达的核点及其最邻近邻居标记为同一个簇。在判断相似邻居时,根据已加入的核点的平均密度和密度变化率自动调整参数值。实验结果表明,该方法可以准确地发现任意形状、大小和密度的簇,消除孤立点,且通过自适应机制更容易设置合适参数。

关键词: 自适应, 变化密度, k最近邻, 聚类, 数据挖掘

Abstract: Aiming at the problem that DBSCAN can not find clusters of varied densities and is sensitive to parameters,this paper proposes a self-adaptive spatial clustering method based on varied density.The algorithm uses the change rate of density to find the boundaries between clusters with different densities,and self-adjust the values of parameters.Specifically,it defines one point’s density as the distance from itself to its k Nearest Neighbor(kNN).If the density change rate of a point and one of its nearest neighbors is less than the threshold given by the user,the neighbor is called similar neighbor.The paper redefines core point as point which has at least k similar neighbors in its nearest neighbors.Based on these modifications,it uses DBSCAN to breadth first search,and marks the connected core points as well as their nearest neighbors as the same cluster.In addition,the algorithm automatically adjusts the values of the parameters at runtime according to the average densities and density change rate of the marked core points.Experimental results show that the improved method can find clusters of arbitrary shape,size and density,and eliminate outliers.Besides,with the self-adaptive,setting parameters is easier than other algorithms.

Key words: self-adaption;varied density, k Nearest Neighbor(kNN);clustering;data mining

中图分类号: