Abstract:
In order to better determine the optimal clustering number for multi-dimensional data, this paper proposes an new algorithm——MHC, which is based on the principle of the traditional algorithm to determine the clustering number for the dataset. This algorithm adopts bottom-up method to generate dataset partition of different levels. In every division, the algorithm automatically generates the partition of clustering quality, and chooses the optimal clustering number by the clustering quality. Additionally, it still presents a new clustering validity index——Between-In- Proportion(BIP), which is used to measure the different division of clustering quality, and mainly depends on the geometrical structure of datasets. Theoretical analysis and experimental results verify the effectiveness and good performance of the new validity index and the MHC algorithm.
Key words:
multi-dimensional dataset,
clustering number,
clustering validity indicator,
hierarchy clustering
摘要: 在传统确定数据集聚类数算法原理的基础上,提出一种新的算法——MHC算法。该算法采用自底向上的策略生成不同层次的数据集划分,计算每个层次的聚类划分质量,通过聚类质量选择最佳的聚类数。还设计一种新的有效性指标——BIP指标,用于衡量不同划分的聚类质量,该指标主要依托数据集的几何结构。实验结果表明,该算法能准确地确定多维数据集中的最佳聚类数。
关键词:
多维数据集,
聚类数,
聚类有效性指标,
层次聚类
CLC Number:
ZHOU Gong-Fang, LI Gong-Yan, LIU Ying, WANG Xiao-Dong. Research on Determinition Algorithm of Clustering Number in Multi-dimensional Dataset[J]. Computer Engineering, 2012, 38(9): 8-11.
周红芳, 李红岩, 刘颖, 王晓东. 多维数据集中聚类数确定算法研究[J]. 计算机工程, 2012, 38(9): 8-11.