作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (9): 8-11. doi: 10.3969/j.issn.1000-3428.2012.09.003

• 博士论文 • 上一篇    下一篇

多维数据集中聚类数确定算法研究

周红芳1,李红岩1,刘 颖2,王晓东3   

  1. (1. 西安理工大学计算机科学与工程学院,西安 710048;2. 攀枝花学院计算机学院,四川 攀枝花 617000; 3. 解放军防空兵指挥学院,郑州 450052)
  • 收稿日期:2011-09-13 出版日期:2012-05-05 发布日期:2012-05-05
  • 作者简介:周红芳(1976-),女,副教授、博士,主研方向:数据仓库,数据挖掘,知识发现,粗糙集;李红岩,硕士研究生;刘 颖、王晓东,讲师、硕士
  • 基金资助:
    国家“863”计划基金资助重点项目(2007AA010305); 陕西省自然科学基础研究计划基金资助项目(SJ08-ZT14);陕西省教育厅科学研究计划基金资助项目(06JK229, 09JK683)

Research on Determinition Algorithm of Clustering Number in Multi-dimensional Dataset

ZHOU Hong-fang   1, LI Hong-yan   1, LIU Ying   2, WANG Xiao-dong   3   

  1. (1. School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; 2. School of Computer, Panzhihua University, Panzhihua 617000, 3. PLA Air Defense Forces Command Academy, Zhengzhou 450052, China)
  • Received:2011-09-13 Online:2012-05-05 Published:2012-05-05

摘要: 在传统确定数据集聚类数算法原理的基础上,提出一种新的算法——MHC算法。该算法采用自底向上的策略生成不同层次的数据集划分,计算每个层次的聚类划分质量,通过聚类质量选择最佳的聚类数。还设计一种新的有效性指标——BIP指标,用于衡量不同划分的聚类质量,该指标主要依托数据集的几何结构。实验结果表明,该算法能准确地确定多维数据集中的最佳聚类数。

关键词: 多维数据集, 聚类数, 聚类有效性指标, 层次聚类

Abstract: In order to better determine the optimal clustering number for multi-dimensional data, this paper proposes an new algorithm——MHC, which is based on the principle of the traditional algorithm to determine the clustering number for the dataset. This algorithm adopts bottom-up method to generate dataset partition of different levels. In every division, the algorithm automatically generates the partition of clustering quality, and chooses the optimal clustering number by the clustering quality. Additionally, it still presents a new clustering validity index——Between-In- Proportion(BIP), which is used to measure the different division of clustering quality, and mainly depends on the geometrical structure of datasets. Theoretical analysis and experimental results verify the effectiveness and good performance of the new validity index and the MHC algorithm.

Key words: multi-dimensional dataset, clustering number, clustering validity indicator, hierarchy clustering

中图分类号: