作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于减法聚类的合并最优路径层次聚类算法

朱 琪,张会福,杨宇波,杨泉清   

  1. (湖南科技大学计算机科学与工程学院,湖南湘潭411201)
  • 收稿日期:2014-06-04 出版日期:2015-06-15 发布日期:2015-06-15
  • 作者简介:朱 琪(1990 - ),女,硕士,主研方向:数据挖掘;张会福,副教授、博士;杨宇波、杨泉清,硕士。
  • 基金资助:

    国家自然科学基金资助项目(51175169);国家科技支撑计划基金资助项目(2012BAF02B01)。

Combined Optimal Path Hierarchical Clustering Algorithm Based on Subtractive Clustering

ZHU Qi,ZHANG Huifu,YANG Yubo,YANG Quanqing   

  1. (School of Computer Science and Engineering,Hunan University of Science and Technology,Xiangtan 411201,China)
  • Received:2014-06-04 Online:2015-06-15 Published:2015-06-15

摘要:

针对传统层次聚类算法在处理大规模数据时效率低下的问题,提出一种快速层次聚类算法。根据数据点密度值的大小依次确定初始聚类中心,使用最小生成树算法对初始聚类中心间的相似度距离进行存储,寻找最优合并路径,从而减少更新距离矩阵的计算量和空间复杂度,并优化减法聚类中的收敛函数。在UCI 数据集上的实验结果表明,该算法比传统聚类算法执行速度更快、效率更高,且随着数据量的增多,在时间消耗方面的优势更明显。

关键词: 初始聚类中心, 最优路径, 快速聚类, 大数据集, 层次聚类

Abstract:

Aiming at the problem that the traditional Hierarchical Clustering ( HC ) algorithm is facing enormous challenges in computation,this paper proposes an algorithm for fast clustering. The algorithm based on the size of the data point density values determines the initial cluster centers sequentially,and for the disadvantages of HC,merger needs to be updated every time in the distance matrix. It uses the minimum spanning tree algorithm to store the similarity distance between the initial cluster centers, finds the optimal merging path, reduces the amount of computation and space complexity to update the distance matrix,and optimizes the convergence function. Experimental results on UCI datasets show that the algorithm is faster,high efficiency than the traditional clustering algorithm. With the increasing of data,the advantage of this algorithm in terms of time consumption is the more obvious.

Key words: initial clustering center, optimal path, fast clustering, large dataset, Hierarchical Clustering(HC)

中图分类号: