作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (9): 198-199,. doi: 10.3969/j.issn.1000-3428.2008.09.071

• 人工智能及识别技术 • 上一篇    下一篇

聚类分析中类数估计方法的实验比较

王开军1,李 健2,张军英1,过立新3   

  1. (1. 西安电子科技大学计算机学院,西安 710071;2. 西北政法大学网络信息中心,西安 710061;3. 西安邮电学院,西安 710061)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-05-05 发布日期:2008-05-05

Experimental Comparison of Clusters Number Estimation for Cluster Analysis

WANG Kai-jun1, LI Jian2, ZHANG Jun-ying1, GUO Li-xin3   

  1. (1. School of Computer Science and Engineering, Xidian University, Xi’an 710071; 2. Net Information Center, Northwest University of Political Science and Law, Xi’an 710061; 3. Xi’an Institute of Post and Telecommunications, Xi’an 710061)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-05-05 Published:2008-05-05

摘要: 在基因表达数据的探索性聚类分析中,聚类个数的确定是决定聚类质量的关键因素。许多聚类有效性评价指标和方法可用于PAM聚类算法。该文讨论适合于PAM算法的7种常用评价指标和方法,采用4种不同聚类结构特征的基因表达数据对它们的性能进行实验比较。结果表明,系统演化方法和稳定性方法估计聚类个数的性能最好,正确率分别为100%与90%。

关键词: 聚类有效性, 聚类个数估计, 聚类分析, 基因表达数据

Abstract: Estimation of clusters number is a crucial problem for applying robust Partitioning Around Medoid(PAM) clustering algorithm to gene expression data. This paper discusses seven methods of cluster validation for PAM algorithm and gives their experimental comparison on estimation of the clusters number, using simulated and real gene expression data that hold four different types of cluster structures. Experimental results show that the system evolution method and stability-based method give estimation accuracy of 90% and 100%.

Key words: cluster validation, clusters number estimation, cluster analysis, gene expression data

中图分类号: