作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (12): 191-192,195.

• 人工智能及识别技术 • 上一篇    下一篇

一种基于 EPDS 的快速K 均值聚类算法

陈作平,叶正麟,刘 明   

  1. 西北工业大学理学院,西安 710072
  • 出版日期:2006-06-20 发布日期:2006-06-20

An Algorithm for Fast K-mean Clustering Based on EPDS

CHEN Zuoping, YE Zhenglin, LIU Ming   

  1. School of Science, Northwestern Polytechnical University, Xi’an 710072
  • Online:2006-06-20 Published:2006-06-20

摘要: K-均值聚类是经常使用的一种数据聚类方法,但对大数据量情形,其聚类过程较慢,主要原因在于聚类过程中每个待聚类向量要反复进行一个最近邻搜索过程,以寻找与其距离最近的聚类中心;据此,文章提出使用扩展的部分失真搜索(Extended Partial DistortionSearch, EPDS)来完成该最近邻搜索,极大地减少了完成聚类所需乘法次数。实验表明,相对于基本的K 均值聚类算法,该方法可以节约1/3 以上的计算量。

关键词: K 均值聚类;扩展的部分失真搜索;最近邻搜索

Abstract: K-mean clustering is a popular method for data clustering, however, it suffers from low speed in the situation of large mount of data, mainly due to its repeatedly solving a problem of nearest neighbor search (NNS) for each vector to find its nearest cluster center in clustering. This paper proposes a fast algorithm for K-mean clustering, based on extended partial distortion search (EPDS), which is used to accelerate the process of NNS, by means of reducing its multiplications greatly. Experiment results indicate that the proposed algorithm can save at least 1/3 multiplications relative to the basic K-mean clustering algorithm.

Key words: K-mean clustering; Extended partial distortion search (EPDS); Nearest neighbor search