作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (04): 11-13. doi: 10.3969/j.issn.1000-3428.2007.04.004

• 博士论文 • 上一篇    下一篇

核聚类算法最佳聚类数的自适应确定方法

普运伟1,2,3,朱 明1,2,金炜东1,胡来招2   

  1. (1. 西南交通大学信息科学与技术学院,成都610031;2. 电子对抗国防科技重点实验室,成都 610036;
    3. 昆明理工大学计算中心,昆明 650093)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-02-20 发布日期:2007-02-20

Self-adaptive Method of Determining Optimal Number of Clusters in Kernel-based Clustering Algorithm

PU Yunwei1,2,3, ZHU Ming1,2, JIN Weidong1, HU Laizhao2   

  1. (1. School of Information Science and Tech., Southwest Jiaotong Univ., Chengdu 610031; 2. Key Laboratory of the National Defense Science and Technology of Electron Resist, Chengdu 610036; 3. Computer Center, Kunming Univ. of Science & Technology, Kunming 650093)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-02-20 Published:2007-02-20

摘要: 在分析核函数所隐式描述的样本间成对相似性的基础上,该文从统计的角度分别定义了能反映类内(类间)样本相似性的类内(类间)个体平均相似系数,设计了一个高效的评价核聚类算法聚类质量的有效性指标。该指标具有物理意义清晰、计算简洁以及对核参数具备一定鲁棒性的优点。在此基础上,提出了一个能自动确定最佳聚类数目和最佳划分的自适应核聚类(SAKC)算法。Benchmarks实验结果验证了所提出的聚类有效性指标及其SAKC算法的有效性和良好性能。

关键词: 核聚类, 聚类有效性, 最佳聚类数, 相似性

Abstract: By investigating the inherent pairwise similarities implicitly defined by the kernel function, this paper defines two statistical similarity coefficients, named as within-cluster and between-cluster average similarity coefficient, which can be used to describe the internal and external similarity between the data items, respectively. And then, an efficient validity index for kernel clustering algorithm is proposed, which has distinct physical meanings, less computational complexity and a certain robustness with respect to Gaussian kernel width. In addition, a self-adaptive kernel clustering (SAKC) algorithm based on the proposed validity index is also developed. The benchmark results demonstrate the effectiveness and performance of the new validity index of SAKC algorithm.

Key words: Kernel-based clustering, Clustering validity, Optimal number of clustering, Similarity