基于变化密度的自适应空间聚类方法研究

doi:10.3969/j.issn.1000-3428.2014.08.012

计算机工程

基于变化密度的自适应空间聚类方法研究

杨亚军,张坤龙,杨晓科

(天津大学计算机科学与技术学院,天津 300072)

收稿日期:2013-07-01 出版日期:2014-08-15 发布日期:2014-08-15
作者简介:杨亚军(1988－)，男，硕士，主研方向：数据库技术，数据挖掘；张坤龙，副教授；杨晓科，硕士。
基金资助:
国家自然科学基金资助项目(11003027)。

Research on Self-adaptive Spatial Clustering Method Based on Varied Density

YANG Ya-jun,ZHANG Kun-long,YANG Xiao-ke

(School of Computer Science and Technology,Tianjin University,Tianjin 300072,China)

Received:2013-07-01 Online:2014-08-15 Published:2014-08-15

摘要/Abstract

摘要： 针对DBSCAN算法无法处理变化密度的问题,提出一种基于变化密度的自适应空间聚类方法。采用密度变化率来识别不同密度的簇之间的边界,且运行时自动调整参数的值。将密度定义为一个点到其第k个最近邻居的距离,若一个点的邻居的密度与该点密度的变化率小于用户给定阈值,则为相似邻居。定义核点为最邻近邻居中至少有k个是相似邻居的点,在此基础上应用DBSCAN算法进行广度优先搜索,将密度相似并且距离可达的核点及其最邻近邻居标记为同一个簇。在判断相似邻居时,根据已加入的核点的平均密度和密度变化率自动调整参数值。实验结果表明,该方法可以准确地发现任意形状、大小和密度的簇,消除孤立点,且通过自适应机制更容易设置合适参数。

关键词: 自适应, 变化密度, k最近邻, 聚类, 数据挖掘

Abstract: Aiming at the problem that DBSCAN can not find clusters of varied densities and is sensitive to parameters,this paper proposes a self-adaptive spatial clustering method based on varied density.The algorithm uses the change rate of density to find the boundaries between clusters with different densities,and self-adjust the values of parameters.Specifically,it defines one point’s density as the distance from itself to its k Nearest Neighbor(kNN).If the density change rate of a point and one of its nearest neighbors is less than the threshold given by the user,the neighbor is called similar neighbor.The paper redefines core point as point which has at least k similar neighbors in its nearest neighbors.Based on these modifications,it uses DBSCAN to breadth first search,and marks the connected core points as well as their nearest neighbors as the same cluster.In addition,the algorithm automatically adjusts the values of the parameters at runtime according to the average densities and density change rate of the marked core points.Experimental results show that the improved method can find clusters of arbitrary shape,size and density,and eliminate outliers.Besides,with the self-adaptive,setting parameters is easier than other algorithms.

Key words: self-adaption;varied density, k Nearest Neighbor(kNN);clustering;data mining

中图分类号:

TP181

杨亚军,张坤龙,杨晓科. 基于变化密度的自适应空间聚类方法研究[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2014.08.012.

YANG Ya-jun,ZHANG Kun-long,YANG Xiao-ke. Research on Self-adaptive Spatial Clustering Method Based on Varied Density[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2014.08.012.

http://www.ecice06.com/CN/Y2014/V40/I8/58

参考文献

［1］ Xu R,Wunsch D.Survey of Clustering Algorithms［J］.IEEE Transactions on Neural Networks,2005,16(3):645-678.  ［2］ Ester M,Kriegel H P,Sander J,et al.A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise［C］//Proc.of Conference on Knowledge Discovery and Data Mining.Portland,USA:AAAI Press,1996:216-224.〖ZK)〗［3］ Ertoz L,Steinbach M,Kumar V.Finding Clusters of Different Sizes,Shapes,and Densities in Noisy,High Dimensional Data［C］//Proc.of SIAM International Conference on Data Mining.Chicago,USA:［s.n.］,2003:333-341.  ［4］ Liu P,Zhou D,Wu N.Varied Density Based Spatial Clustering of Application with Noise［C］//Proc.of International Conference on Service Systems and Service Management.Chengdu,China:［s.n.］,2007:123-129.  ［5］ Ankerst M,Breunig M M,Kriegel H P,et al.OPTICS:Ordering Points to Identify the Clustering Structure［J］.ACM SIGMOD Record,1999,28(2):49-60.  ［6］马帅,王腾蛟,唐世渭,等.一种基于参考点和密度的快速聚类算法［J］.软件学报,2003,14(6):1089-1095.  ［7］陈刚,刘秉权,吴岩.一种基于高斯分布的自适应 DBSCAN算法［J］.微电子学与计算机,2013,30(3):27-30.  ［8］夏鲁宁,荆继武.SA-DBSCAN:一种自适应基于密度聚类算法［J］.中国科学院研究生院学报,2009,26(4):530-538.  ［9］于亚飞,周爱武.一种改进的DBSCAN密度算法［J］.计算机技术与发展,2011,21(2):30-33.  ［10］欧阳佳,林丕源.基于DBSCAN算法的网页正文提取［J］.计算机工程,2011,37(3):64-66.  ［11］蔡岳,袁津生.基于改进DBSCAN算法的文本聚类［J］.计算机工程,2011,37(12):50-52.  ［12］ Karypis G,Han E H,Kumar V.Chameleon:Hierarchical Clustering Using Dynamic Modeling［J］.Computer,1999,32(8):68-75. 编辑索书志

[1]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[2]	张欣怡, 张飞, 郝斌, 高鹭, 任晓颖. 基于改进YOLOv5的口罩佩戴检测算法[J]. 计算机工程, 2023, 49(8): 265-274.
[3]	郑美光, 杨泳. 基于互信息软聚类的个性化联邦学习算法[J]. 计算机工程, 2023, 49(8): 20-28.
[4]	李泽水, 冀俊忠, 杨翠翠. 基于边权重信息深度网络嵌入的PPIN功能模块检测[J]. 计算机工程, 2023, 49(8): 69-76.
[5]	马娜, 温廷新, 贾旭, 李晓会. 复杂光照条件下自适应的车脸重识别模型[J]. 计算机工程, 2023, 49(8): 275-282, 290.
[6]	邱天晨, 郑小盈, 祝永新, 封松林. 面向非独立同分布数据的联邦学习架构[J]. 计算机工程, 2023, 49(7): 110-117.
[7]	汤卫芬, 高翠芳. 极值点自适应加权的动态时间规整算法[J]. 计算机工程, 2023, 49(7): 150-160.
[8]	梅晶, 戴龙宝, 童钊, 邓昕, 王嘉珂. 资源约束下基于Lyapunov优化的自适应卸载算法[J]. 计算机工程, 2023, 49(7): 34-46.
[9]	蔡倩倩, 史旭华. 自适应迁移的分解多目标多任务进化算法[J]. 计算机工程, 2023, 49(7): 55-64.
[10]	顾轶寅, 王鸿奎, 殷海兵. 基于上下文自适应阈值剪枝的快速依赖量化算法[J]. 计算机工程, 2023, 49(7): 143-149.
[11]	高小方, 原玉梁, 温静, 白雪飞. 面向相交多流形聚类的标签传播算法[J]. 计算机工程, 2023, 49(6): 90-98.
[12]	位雅, 张正军, 何凯琳, 唐莉. 基于相对密度的密度峰值聚类算法[J]. 计算机工程, 2023, 49(6): 53-61.
[13]	戴浩磊, 黄永慧, 周郭许. 基于超图正则化非负张量链分解的聚类分析[J]. 计算机工程, 2023, 49(6): 81-89.
[14]	王爱玲, 马文臻, 邹自明, 钟佳. 基于领域自适应的卫星工程参数异常检测[J]. 计算机工程, 2023, 49(5): 29-37,47.
[15]	石进, 徐杨, 曹斌. 基于自适应三线性池化网络的细粒度图像分类[J]. 计算机工程, 2023, 49(5): 239-246,254.

选择文件类型/文献管理软件名称

选择包含的内容

基于变化密度的自适应空间聚类方法研究

Research on Self-adaptive Spatial Clustering Method Based on Varied Density

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于变化密度的自适应空间聚类方法研究

Research on Self-adaptive Spatial Clustering Method Based on Varied Density

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价