作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (12): 47-49.

• 软件技术与数据库 • 上一篇    下一篇

面向混合属性的高效聚类算法研究

蒋盛益 1,阮幼林2,李庆华3   

  1. 1. 广东外语外贸大学信息学院,广州 510420;2. 武汉理工大学信息工程学院,武汉 430070;3. 华中科技大学计算机学院,武汉 430074
  • 出版日期:2006-06-20 发布日期:2006-06-20

Research on Efficient Clustering Algorithm for Mixed Attributes

JIANG Shengyi 1, RUAN Youlin 1, LI Qinghua 2   

  1. 1. College of information, Guangdong University of Foreigh Studies, Guangzhou 510420; 2. School of Information Engineering, Wuhan University of Technology, Wuhan 430070; 3. Department of Computer Science, Huazhong University of Science and Technology, Wuhan 430074
  • Online:2006-06-20 Published:2006-06-20

摘要: 将夹角余弦的概念推广到混合属性的数据,提出了一种基于相似度的聚类方法CABMS,同时给出了一种计算聚类阈值的简单有效的策略。有关CABMS 数据库的大小,属性个数具有近似线性时间复杂度,使得聚类方法CABMS 具有好的扩展性。实验结果表明,CABMS可产生高质量的聚类结果。

关键词: 相似度;聚类;数据挖掘

Abstract: cosine is generalized to data with mixed attributes and a clustering algorithm based on the rule of maximum similarity, named CABMS, is presented in this paper. At the same time, a simple and effective strategy to calculate cluster threshold is put forward. The clustering algorithm CABMS has the nearly linear time complexity with the size of dataset and the number of attributes, which results in good scalability. The experimental results show that the CABMS creates high quality cluster.

Key words: Similarity; Clustering; Data mining