计算机工程 ›› 2019, Vol. 45 ›› Issue (6): 75-81.doi: 10.19678/j.issn.1000-3428.0051839

• 先进计算与数据处理 • 上一篇    下一篇

面向稀疏高维大数据的扩展增量模糊聚类算法

钱雪忠,姚琳燕   

  1. 江南大学 物联网工程学院 物联网技术应用教育部工程研究中心,江苏 无锡 214122
  • 收稿日期:2018-06-15 出版日期:2019-06-15 发布日期:2019-06-15
  • 作者简介:钱雪忠(1967—),男,副教授、硕士,主研方向为数据挖掘、数据库技术、网络安全;姚琳燕(通信作者),硕士研究生
  • 基金项目:

    国家自然科学基金(61673193);中央高校基本科研业务费专项资金(JUSRP51510,JUSRP51635B)。

Extended incremental fuzzy clustering algorithm for sparse high-dimensional big data

QIAN Xuezhong,YAO Linyan   

  1. Engineering Research Center of IoT Technology and Application,Ministry of Education,College of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China
  • Received:2018-06-15 Online:2019-06-15 Published:2019-06-15

摘要:

模糊C均值(FCM)聚类算法对初始中心点敏感,不考虑类别间中心点的相互影响,且仅能处理低维数据。为此,设计一种改进的初始中心点选择方法,并基于条件模糊聚类思想,将传统FCM算法中的欧氏距离替换为余弦距离后提出wHFCLM算法。将该算法与扩展增量聚类算法spFCM、oFCM和rseFCM相结合,得到对应的扩展增量模糊聚类算法spHF(c+l)M、oHF(c+l)M以及rseHF(c+l)M。实验结果表明,与spFCM算法、oFCM算法和rseFCM算法相比,扩展增量模糊聚类算法对初始中心点的选择敏感性较低,能较好地处理大规模稀疏高维数据集,且在合适的分块大小下具有更优的聚类性能。

关键词: 扩展聚类算法, 条件聚类, 稀疏高维大数据, 模糊聚类, 初始中心点

Abstract:

Fuzzy C-Means(FCM) clustering algorithm can only deal with low-dimensional data and is sensitive to the initial center,without considering the interactions between class centers.For this reason,an improved method of initial center selection is designed based on the idea of conditional fuzzy clustering,replacing the Euclidean distance in the traditional FCM algorithm with the cosine distance.A wHFCLM algorithm is proposed and combined with extended incremental clustering algorithms,spFCM,oFCM and rseFCM,to generate their extended incremental fuzzy clustering algorithms,spHF(c+l)M,oHF(c+l)M and rseHF(c+l)M.Experimental results show that compared with spFCM,oFCM and rseFCM,the extended incremental fuzzy clustering algorithms is less sensitive to the selection of initial centers.It can better handle large-scale sparse high-dimensional data sets,and has better clustering performance under blocks of the appropriate size.

Key words: extended clustering algorithm, conditional clustering, sparse high-dimensional big data, fuzzy clustering, initial center

中图分类号: