摘要: 在基因表达数据的探索性聚类分析中,聚类个数的确定是决定聚类质量的关键因素。许多聚类有效性评价指标和方法可用于PAM聚类算法。该文讨论适合于PAM算法的7种常用评价指标和方法,采用4种不同聚类结构特征的基因表达数据对它们的性能进行实验比较。结果表明,系统演化方法和稳定性方法估计聚类个数的性能最好,正确率分别为100%与90%。
关键词:
聚类有效性,
聚类个数估计,
聚类分析,
基因表达数据
Abstract: Estimation of clusters number is a crucial problem for applying robust Partitioning Around Medoid(PAM) clustering algorithm to gene expression data. This paper discusses seven methods of cluster validation for PAM algorithm and gives their experimental comparison on estimation of the clusters number, using simulated and real gene expression data that hold four different types of cluster structures. Experimental results show that the system evolution method and stability-based method give estimation accuracy of 90% and 100%.
Key words:
cluster validation,
clusters number estimation,
cluster analysis,
gene expression data
中图分类号:
王开军;李 健;张军英;过立新. 聚类分析中类数估计方法的实验比较[J]. 计算机工程, 2008, 34(9): 198-199,.
WANG Kai-jun; LI Jian; ZHANG Jun-ying; GUO Li-xin. Experimental Comparison of Clusters Number Estimation for Cluster Analysis[J]. Computer Engineering, 2008, 34(9): 198-199,.