Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2008, Vol. 34 ›› Issue (9): 198-199,. doi: 10.3969/j.issn.1000-3428.2008.09.071

• Artificial Intelligence and Recognition Technology • Previous Articles     Next Articles

Experimental Comparison of Clusters Number Estimation for Cluster Analysis

WANG Kai-jun1, LI Jian2, ZHANG Jun-ying1, GUO Li-xin3   

  1. (1. School of Computer Science and Engineering, Xidian University, Xi’an 710071; 2. Net Information Center, Northwest University of Political Science and Law, Xi’an 710061; 3. Xi’an Institute of Post and Telecommunications, Xi’an 710061)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-05-05 Published:2008-05-05

聚类分析中类数估计方法的实验比较

王开军1,李 健2,张军英1,过立新3   

  1. (1. 西安电子科技大学计算机学院,西安 710071;2. 西北政法大学网络信息中心,西安 710061;3. 西安邮电学院,西安 710061)

Abstract: Estimation of clusters number is a crucial problem for applying robust Partitioning Around Medoid(PAM) clustering algorithm to gene expression data. This paper discusses seven methods of cluster validation for PAM algorithm and gives their experimental comparison on estimation of the clusters number, using simulated and real gene expression data that hold four different types of cluster structures. Experimental results show that the system evolution method and stability-based method give estimation accuracy of 90% and 100%.

Key words: cluster validation, clusters number estimation, cluster analysis, gene expression data

摘要: 在基因表达数据的探索性聚类分析中,聚类个数的确定是决定聚类质量的关键因素。许多聚类有效性评价指标和方法可用于PAM聚类算法。该文讨论适合于PAM算法的7种常用评价指标和方法,采用4种不同聚类结构特征的基因表达数据对它们的性能进行实验比较。结果表明,系统演化方法和稳定性方法估计聚类个数的性能最好,正确率分别为100%与90%。

关键词: 聚类有效性, 聚类个数估计, 聚类分析, 基因表达数据

CLC Number: