作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于属性划分和弧形距离的层次聚类算法

夏卓群1,欧慧1,武志伟1,范开钦2   

  1. (1.长沙理工大学计算机与通信工程学院,长沙 410114; 2.湖南省国家税务局,长沙 410114)
  • 收稿日期:2014-08-18 出版日期:2015-08-15 发布日期:2015-08-15
  • 作者简介:夏卓群(1977-),男,副教授、博士后,主研方向:数据挖掘;欧慧(通讯作者)、武志伟,硕士研究生;范开钦,高级工程师、硕士。
  • 基金资助:
    湖南省自然科学基金资助项目(14JJ7043);湖南省交通运输厅科技进步与创新基金资助项目(201405)。

Hierarchical Clustering Algorithm Based on Attribute Partitioning and Curve Distance

XIA Zhuoqun  1,OU Hui  1,WU Zhiwei  1,FAN Kaiqin  2   

  1. (1.School of Computer and Communication Engineering,Changsha University of Science and Technology,Changsha 410114,China;2.The State Taxation Bureau of Hunan Province,Changsha 410114,China)
  • Received:2014-08-18 Online:2015-08-15 Published:2015-08-15

摘要: 传统k-means初始中心随机选取,在较大范围内,利用以流形距离为相似度测度的参数不能较好地反映数据集的全局一致性。为此,基于属性划分和弧形距离,提出一种层次聚类算法。依据粒计算中属性划分思想和最大最小距离法则选择初始阶段的类代表点,根据k-means进行 粗聚类。采用新的距离测度,即弧形距离和反映类内相似度大类间相似度小的准则函数,对初阶段类代表点聚类归类得到期望类代表点。每个数据点依据其类代表点的类标签信息找到自己所属的类标签。实验结果表明,与其他算法相比,该算法较好地体现数据集的全局一致 性,减少了运行时间。

关键词: 弧形距离, 属性划分, 最大最小距离, 聚类归类, 类标签

Abstract: Aiming at resolving the problems of the traditional k-means algorithm random selecting of initial clustering centers,having the flaw of the global consistency on the large scale whose parameters are based on manifold distance as the measure of the similarity.A hierarchical clustering algorithm based on attribute partitioning and curve distance is proposed.It is based on the attribute partitioning ideological of granular computing and max-min distance method selects initial cluster centers and makes the crude clustering by k-means to get early stage exemplars.According to new distance measure,that is curve distance and criterion function.The big similarity within class and smaller similarity between class does cluster classification to get expect exemplars.Each data points are assigned through the labels of their corresponding representative exemplars.Experimental results show that the algorithm has the good global consistency to the data set,and the running time is reduced.

Key words: curve distance, attribute partitioning, max-min distance, cluster classification, class lable

中图分类号: