基于属性划分和弧形距离的层次聚类算法

doi:10.3969/j.issn.1000-3428.2015.08.032

计算机工程

基于属性划分和弧形距离的层次聚类算法

夏卓群¹,欧慧¹,武志伟¹,范开钦²

(1.长沙理工大学计算机与通信工程学院,长沙 410114; 2.湖南省国家税务局,长沙 410114)

收稿日期:2014-08-18 出版日期:2015-08-15 发布日期:2015-08-15
作者简介:夏卓群(1977-),男,副教授、博士后,主研方向:数据挖掘;欧慧(通讯作者)、武志伟,硕士研究生;范开钦,高级工程师、硕士。
基金资助:
湖南省自然科学基金资助项目(14JJ7043);湖南省交通运输厅科技进步与创新基金资助项目(201405)。

Hierarchical Clustering Algorithm Based on Attribute Partitioning and Curve Distance

XIA Zhuoqun ¹,OU Hui ¹,WU Zhiwei ¹,FAN Kaiqin ²

(1.School of Computer and Communication Engineering,Changsha University of Science and Technology,Changsha 410114,China;2.The State Taxation Bureau of Hunan Province,Changsha 410114,China)

Received:2014-08-18 Online:2015-08-15 Published:2015-08-15

摘要/Abstract

摘要： 传统k-means初始中心随机选取,在较大范围内,利用以流形距离为相似度测度的参数不能较好地反映数据集的全局一致性。为此,基于属性划分和弧形距离,提出一种层次聚类算法。依据粒计算中属性划分思想和最大最小距离法则选择初始阶段的类代表点,根据k-means进行粗聚类。采用新的距离测度,即弧形距离和反映类内相似度大类间相似度小的准则函数,对初阶段类代表点聚类归类得到期望类代表点。每个数据点依据其类代表点的类标签信息找到自己所属的类标签。实验结果表明,与其他算法相比,该算法较好地体现数据集的全局一致性,减少了运行时间。

关键词: 弧形距离, 属性划分, 最大最小距离, 聚类归类, 类标签

Abstract: Aiming at resolving the problems of the traditional k-means algorithm random selecting of initial clustering centers,having the flaw of the global consistency on the large scale whose parameters are based on manifold distance as the measure of the similarity.A hierarchical clustering algorithm based on attribute partitioning and curve distance is proposed.It is based on the attribute partitioning ideological of granular computing and max-min distance method selects initial cluster centers and makes the crude clustering by k-means to get early stage exemplars.According to new distance measure,that is curve distance and criterion function.The big similarity within class and smaller similarity between class does cluster classification to get expect exemplars.Each data points are assigned through the labels of their corresponding representative exemplars.Experimental results show that the algorithm has the good global consistency to the data set,and the running time is reduced.

Key words: curve distance, attribute partitioning, max-min distance, cluster classification, class lable

中图分类号:

TP301.6

夏卓群,欧慧,武志伟,范开钦. 基于属性划分和弧形距离的层次聚类算法[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2015.08.032.

XIA Zhuoqun,OU Hui,WU Zhiwei,FAN Kaiqin. Hierarchical Clustering Algorithm Based on Attribute Partitioning and Curve Distance[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2015.08.032.

http://www.ecice06.com/CN/Y2015/V41/I8/174

参考文献

参考文献［1］Han J W,Kamber M,Pei Jian.数据挖掘概念与技术［M］.范明,梦小峰,译.北京:机械工业出版社,2012. ［2］Zhou Dengyong,Bouaquet O,Weston J,et al.Learning with Local and Global Consistency［M］.Cambridge,USA:MIT Press,2004. ［3］杨瑞瑞,牛建强,孟红飞.基于流形矩离的迭代聚类算法路面裂缝提取［J］.计算机工程,2011,37(12):212-214. ［4］魏莱,王守觉.基于流形距离的半监督判别分析［J］.软件学报,2010,21(10):2445-2453. ［5］李阳阳,石洪竺,焦李成,等.基于流形距离的量子进化聚类算法［J］.电子学报,2011,39(10):2343-2347. ［6］Wang Na,Wang Sun’an,Du Haifeng.An Iterative Optimization Clustering Algorithm Based on Manifold Distance［C］//Proceedings of the 4th IEEE Conference on Industrial Electronics and Applications.Washington D.C.,USA:IEEE Press,2009:1565-1568. ［7］Tao Xinmin,Song Shaoyu,Cao Pandong.A Spectral Clustering Algorithm Based on Manifold Distance Kernel［J］.Information and Control,2012,41(3):307-313. ［8］潘晓英,刘芳,焦李成.密度敏感的多智能体进化聚类算法［J］.软件学报,2010,21(10):2420-2431. ［9］王玲,薄列峰,焦李成.密度敏感的半监督谱聚类［J］.软件学报,2007,18(10):2412-2422. ［10］Gong Maoguo,Jiao Licheng,Wang Ling,et al.Density-sensitive Evolutionary Clustering［C］//Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining.Berlin,Germany:Springer,2007:507-514. ［11］Gong Maoguo,Jiao Licheng,Bo Liefeng,et al.Image Texture Classification Using a Manifold Distance Evolutionary Clustering Method［J］.Opitical Engineer-ing,2008,47(7). ［12］吴毓龙,袁平波.密度敏感的距离测度在特定图像聚类中的应用［J］.计算机工程,2009,35(6):210-212. ［13］苗谦,王匡胤,刘清,等.粒计算:过去、现在与展望［M］.北京:科学出版社,2007. ［14］邱兴兴,程霄.基于改进流形距离k-medoids算法［J］.计算机应用,2013,33(9):2482-2485. ［15］严蔚敏,吴伟民.数据结构［M］.北京:清华大学出版社,1997. ［16］公茂果,王爽,马萌,等.复杂分布数据的二阶段聚类算法［J］.软件学报,2011,22(11):2760-2772. ［17］卢鹏丽,王祖东.密度敏感的层次化聚类算法研究［J］.计算机工程与应用,2014,50(4):190-195. ［18］Yan Donghui,Huang Ling,Jordan M I.Fast Approximate Spectral Clustering［C］//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,2009:907-916. ［19］Blake C L,Merz C J.UCI Machine Learning Reposit ory［EB/OL］.(2010-05-07).http://archive.ics.uci.edu/ml. 编辑顾逸斐

[1]	曹瑞阳, 郭佑民, 牛满宇. 基于最大最小距离的多中心数据综合增强方法[J]. 计算机工程, 2022, 48(6): 174-181.
[2]	张伟，黄炜，夏利民，罗大庸. 基于SLPP与MKSVM的痛苦表情识别[J]. 计算机工程, 2013, 39(12): 196-199.
[3]	潘炜;沈超. 面向层次分类标签的词性标注系统[J]. 计算机工程, 2009, 35(21): 197-199.

选择文件类型/文献管理软件名称

选择包含的内容

基于属性划分和弧形距离的层次聚类算法

Hierarchical Clustering Algorithm Based on Attribute Partitioning and Curve Distance

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于属性划分和弧形距离的层次聚类算法

Hierarchical Clustering Algorithm Based on Attribute Partitioning and Curve Distance

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价