作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 88-96,103. doi: 10.19678/j.issn.1000-3428.0057901

• 人工智能与模式识别 • 上一篇    下一篇

基于密度敏感距离的改进模糊C均值聚类算法

王治和, 王淑艳, 杜辉   

  1. 西北师范大学 计算机科学与工程学院, 兰州 730070
  • 收稿日期:2020-03-30 修回日期:2020-04-30 发布日期:2020-05-09
  • 作者简介:王治和(1965-),男,教授,主研方向为数据挖掘;王淑艳,硕士研究生;杜辉,副教授、博士。
  • 基金资助:
    国家自然科学基金(61962054)。

Improved Fuzzy C-means Clustering Algorithm Based on Density-Sensitive Distance

WANG Zhihe, WANG Shuyan, DU Hui   

  1. School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
  • Received:2020-03-30 Revised:2020-04-30 Published:2020-05-09

摘要: 模糊C均值(FCM)聚类算法无法识别非凸数据,算法中基于欧式距离的相似性度量只考虑数据点之间的局部一致性特征而忽略了全局一致性特征。提出一种利用密度敏感距离度量创建相似度矩阵的FCM算法。通过近邻传播算法获取粗类数作为最佳聚类数的搜索范围上限,以解决FCM算法聚类数目需要人为预先设定和随机选定初始聚类中心造成聚类结果不稳定的问题。在此基础上,改进最大最小距离算法,得到具有代表性的样本点作为初始聚类中心,并结合轮廓系数自动确定最佳聚类数。基于UCI数据集和人工数据集的实验结果表明,相比经典FCM、K-means和CFSFDP算法,该算法不仅具有识别复杂非凸数据的能力,而且能够在保证聚类性能和稳定性的前提下加快收敛速度。

关键词: 模糊C均值聚类算法, 密度敏感距离, 近邻传播, 初始聚类中心, 轮廓系数

Abstract: The Fuzzy C-means(FCM) clustering algorithm cannot identify non-convex data,and its similarity measure based on Euclidean distance only considers the local consistency feature between data points while ignoring the global consistency feature.To address the problem,this paper proposes an improved FCM algorithm that uses the density-sensitive distance measure to create the similarity matrix.The proposed algorithm employs the Affinity Propagation (AP) algorithm to obtain the coarse number of clusters as the upper limit of the search of the optimal cluster number,to avoid the instability of clustering results of the classical FCM algorithm,which requires the clustering number to be manually set in advance and initial clustering center to be randomly selected.On this basis,the maximum and minimum distance algorithm is improved to obtain representative sample points as the initial clustering center,and the optimal cluster number is determined based on the silhouette coefficient.Experimental results on UCI and artificial data sets show that compared with the classical FCM,K-means and CFSFDP algorithms,the proposed algorithm is capable of identifying complex non-convex data,and improves the convergence speed with ensured clustering performance and stability.

Key words: Fuzzy C-means(FCM) clustering algorithm, density-sensitive distance, Affinity Propagation(AP), initial clustering center, silhouette coefficient

中图分类号: