基于密度敏感距离的改进模糊C均值聚类算法

doi:10.19678/j.issn.1000-3428.0057901

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 88-96,103. doi: 10.19678/j.issn.1000-3428.0057901

基于密度敏感距离的改进模糊C均值聚类算法

王治和, 王淑艳, 杜辉

西北师范大学计算机科学与工程学院, 兰州 730070

收稿日期:2020-03-30 修回日期:2020-04-30 发布日期:2020-05-09
作者简介:王治和(1965-),男,教授,主研方向为数据挖掘;王淑艳,硕士研究生;杜辉,副教授、博士。
基金资助:
国家自然科学基金（61962054）。

Improved Fuzzy C-means Clustering Algorithm Based on Density-Sensitive Distance

WANG Zhihe, WANG Shuyan, DU Hui

School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China

Received:2020-03-30 Revised:2020-04-30 Published:2020-05-09

摘要/Abstract

摘要： 模糊C均值（FCM）聚类算法无法识别非凸数据，算法中基于欧式距离的相似性度量只考虑数据点之间的局部一致性特征而忽略了全局一致性特征。提出一种利用密度敏感距离度量创建相似度矩阵的FCM算法。通过近邻传播算法获取粗类数作为最佳聚类数的搜索范围上限，以解决FCM算法聚类数目需要人为预先设定和随机选定初始聚类中心造成聚类结果不稳定的问题。在此基础上，改进最大最小距离算法，得到具有代表性的样本点作为初始聚类中心，并结合轮廓系数自动确定最佳聚类数。基于UCI数据集和人工数据集的实验结果表明，相比经典FCM、K-means和CFSFDP算法，该算法不仅具有识别复杂非凸数据的能力，而且能够在保证聚类性能和稳定性的前提下加快收敛速度。

关键词: 模糊C均值聚类算法, 密度敏感距离, 近邻传播, 初始聚类中心, 轮廓系数

Abstract: The Fuzzy C-means(FCM) clustering algorithm cannot identify non-convex data,and its similarity measure based on Euclidean distance only considers the local consistency feature between data points while ignoring the global consistency feature.To address the problem,this paper proposes an improved FCM algorithm that uses the density-sensitive distance measure to create the similarity matrix.The proposed algorithm employs the Affinity Propagation (AP) algorithm to obtain the coarse number of clusters as the upper limit of the search of the optimal cluster number,to avoid the instability of clustering results of the classical FCM algorithm,which requires the clustering number to be manually set in advance and initial clustering center to be randomly selected.On this basis,the maximum and minimum distance algorithm is improved to obtain representative sample points as the initial clustering center,and the optimal cluster number is determined based on the silhouette coefficient.Experimental results on UCI and artificial data sets show that compared with the classical FCM,K-means and CFSFDP algorithms,the proposed algorithm is capable of identifying complex non-convex data,and improves the convergence speed with ensured clustering performance and stability.

Key words: Fuzzy C-means(FCM) clustering algorithm, density-sensitive distance, Affinity Propagation(AP), initial clustering center, silhouette coefficient

中图分类号:

TP18

王治和, 王淑艳, 杜辉. 基于密度敏感距离的改进模糊C均值聚类算法[J]. 计算机工程, 2021, 47(5): 88-96,103.

WANG Zhihe, WANG Shuyan, DU Hui. Improved Fuzzy C-means Clustering Algorithm Based on Density-Sensitive Distance[J]. Computer Engineering, 2021, 47(5): 88-96,103.

https://www.ecice06.com/CN/Y2021/V47/I5/88

参考文献

[1] HAN J,KAMBER M,PEI J.Data mining concept and techniques[M].[S.l.]:Morgan Kaufmann,2011.
[2] BEZDEK J C.Pattern recognition with fuzzy objective function algorithms[J].Advanced Applications in Pattern Recognition,1981,22(1171):203-239.
[3] WANG Xizhao,WANG Yadong,WANG Lijuan.Improving fuzzy C-means clustering based on feature-weight learning[J].Pattern Recognition Letters,2004,25(10):1123-1132.
[4] KANNAN S R,DEVI R,RAMATHILAGAM S,et al.Effective FCM noise clustering algorithms in medical images[J].Computers in Biology & Medicine,2013,43(2):73-83.
[5] GUEORGUIEVA N,VALOVA I,GEORGIEV G.M&MFCM:fuzzy C-means clustering with Mahalanobis and Minkowski distance metrics[J].Procedia Computer Science,2017,114:224-233.
[6] SEAL A,KARLEKAR A,KREJCAR O,et al.Fuzzy C-means clustering using Jeffreys-divergence based similarity measure[J].Applied Soft Computing,2020,88:1-5.
[7] KANG Jiayin,JI Zhicheng,GONG Chenglong.Kernelized fuzzy C-means clustering algorithm and its application[J].Chinese Journal of Scientific Instrument,2010,31(7):1657-1663.(in Chinese)康家银,纪志成,龚成龙.一种核模糊C均值聚类算法及其应用[J].仪器仪表学报,2010,31(7):1657-1663.
[8] ZENG Shan,TONG Xiaojun,SANG Nong.Study on multi-center fuzzy C-means algorithm based on transitive closure and spectral clustering[J].Applied Soft Computing,2014,16:89-101.
[9] TAO Xinmin,WANG Ruotong,CHANG Rui,et al.Spectral clustering algorithm using density-sensitive distance measure with global and local consistencies[J].Knowledge-Based Systems,2019,170:26-42.
[10] FREY B J,DUECK D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976.
[11] SUBBALAKSHMI C,KRISHNA G R,RAO S K M,et al.A method to find optimum number of clusters based on fuzzy silhouette on dynamic data set[J].Procedia Computer Science,2015,46:346-353.
[12] ESTIRI H,OMRAN B A,MURPHY S N.kluster:an efficient scalable procedure for approximating the number of clusters in unsupervised learning[J].Big Data Research,2018,13:38-51.
[13] ZHU Erzhou,ZHANG Yuanxiang,WEN Peng,et al.Fast and stable clustering analysis based on grid-mapping K-means algorithm and new clustering validity index[J].Neurocomputing,2019,363:149-170.
[14] PHAM V N,NGO L T,PEDRYCZ W.Interval-valued fuzzy set approach to fuzzy Co-clustering for data classification[J].Knowledge-Based Systems,2016,107:1-13.
[15] HANMANDLU M,VERMA O P,SUSAN S,et al.Color segmentation by fuzzy co-clustering of chrominance color features[J].Neurocomputing,2013,120:235-249.
[16] de AMORIM R C,HENNIG C.Recovering the number of clusters in data sets with noise features using feature rescaling factors[J].Information Sciences,2015,324:126-145.
[17] LING Huilinag,WU Jiansheng,ZHOU Yi,et al.How many clusters?A robust pso-based local density model[J].Neurocomputing,2016,207:264-275.
[18] CHENG Weiqing,LU Yanhong.Adaptive clustering algorithm based on maximum and minimum distances and SSE[J].Journal of Nanjing University of Posts and Telecommunications(Natural Science Edition),2015,35(2):102-107.(in Chinese)成卫青,卢艳红.一种基于最大最小距离和SSE的自适应聚类算法[J].南京邮电大学学报(自然科学版),2015,35(2):102-107.
[19] FREY B J,DUECK D.Response to comment on "clustering by passing messages between data points"[J].Science,2008,319(5864):726-726.
[20] WANG Kaijun,LI Jian,ZHANG Junying,et al.Semi-supervised affinity propagation clustering[J].Computer Engineering,2007,33(23):197-198,201.(in Chinese)王开军,李健,张军英,等.半监督的仿射传播聚类[J].计算机工程,2007,33(23):197-198,201.
[21] SUN Jixiang.Modern pattern recognition[M].Changsha:National University of Defense Technology,2002.(in Chinese)孙即祥.现代模式识别[M].长沙:国防科技大学出版社,2002.
[22] RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[23] WU T F,TSAI P S,HU N T,et al.Combining turning point detection and Dijkstra's algorithm to search the shortest path[J].Advances in Mechanical Engineering,2017,9(2):1-12.
[24] MACQUEEN J.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley,USA:University of California Press,1967:281-297.
[25] SHANG Fanhua,JIAO Licheng,SHI Jiarong,et al.Fast affinity propagation clustering:a multilevel approach[J].Pattern Recognition,2012,45(1):474-486.
[26] VINH N X,EPPS J,BAILEY J.Bibliometrics:information theoretic measures for clusterings comparison[C]//Proceedings of International Conference on Machine Learning.New York,USA:ACM Press,2010:2837-2854.

选择文件类型/文献管理软件名称

选择包含的内容

基于密度敏感距离的改进模糊C均值聚类算法

Improved Fuzzy C-means Clustering Algorithm Based on Density-Sensitive Distance

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics

本文评价

[1]	田璐, 曹付元, 余丽琴. 一种分类型矩阵数据的初始聚类中心选择算法[J]. 计算机工程, 2020, 46(5): 115-121.
[2]	曾碧,黄文. 一种融合多特征聚类集成的室内点云分割方法[J]. 计算机工程, 2018, 44(3): 281-286.
[3]	朱琪,张会福,杨宇波,杨泉清. 基于减法聚类的合并最优路径层次聚类算法[J]. 计算机工程, 2015, 41(6): 178-182,187.
[4]	冷泳林,陈志奎,张清辰,鲁富宇. 不完整大数据的分布式聚类填充算法[J]. 计算机工程, 2015, 41(5): 19-25.
[5]	谢娟英,王艳娥. 最小方差优化初始聚类中心的K-means算法[J]. 计算机工程, 2014, 40(8): 205-211,223.
[6]	冯晓磊, 于洪涛. 密度不敏感的近邻传播聚类算法研究[J]. 计算机工程, 2012, 38(2): 159-162.
[7]	谷瑞军, 汪加才, 陈耿, 陈圣磊. 面向大规模数据集的近邻传播聚类[J]. 计算机工程, 2010, 36(23): 22-24.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于密度敏感距离的改进模糊C均值聚类算法

Improved Fuzzy C-means Clustering Algorithm Based on Density-Sensitive Distance

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics

本文评价