一种可重叠子空间K-Means聚类算法

doi:10.19678/j.issn.1000-3428.0054555

计算机工程 ›› 2020, Vol. 46 ›› Issue (8): 58-63,71. doi: 10.19678/j.issn.1000-3428.0054555

一种可重叠子空间K-Means聚类算法

刘宇航¹, 马慧芳^1,2, 刘海姣¹, 余丽¹

1. 西北师范大学计算机科学与工程学院, 兰州 730070;
2. 桂林电子科技大学广西可信软件重点实验室, 广西桂林 541004

收稿日期:2019-04-10 修回日期:2019-07-01 发布日期:2019-07-17
作者简介:刘宇航(1996-),男,硕士研究生,主研方向为智能计算;马慧芳(通信作者),教授、博士;刘海姣,硕士研究生;余丽,讲师、博士。
基金资助:
国家自然科学基金（61762078，61363058）；广西可信软件重点实验室研究课题（kx202003）；广西多源信息挖掘与安全重点实验室开放基金（MIMS18-08）；西北师范大学2019年度青年教师科研能力提升计划重大项目（NWNU-LKQN2019-2）。

An Overlapping Subspace K-Means Clustering Algorithm

LIU Yuhang¹, MA Huifang^1,2, LIU Haijiao¹, YU Li¹

1. College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China;
2. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China

Received:2019-04-10 Revised:2019-07-01 Published:2019-07-17

摘要/Abstract

摘要： 现有聚类算法面向高维稀疏数据时多数未考虑类簇可重叠和离群点的存在，导致聚类效果不理想。为此，提出一种可重叠子空间K-Means聚类算法。设计类簇子空间计算策略，在聚类过程中动态更新每个类簇的属性子空间，并定义合理的约束函数指导聚类过程，从而实现类簇的可重叠性与离群点的控制。在此基础上定义合理的目标函数对传统K-Means算法进行修正，利用熵权约束分别计算每个类簇中各维度的权重，使用权重值标识不同类簇中维度的相对重要性，并加入控制重叠程度和离群值数量的参数。在人工数据集和真实数据集上的实验结果表明，该算法在NMI、F1指标上均优于EWKM、NEO-K-Means、OKM等子空间聚类算法，具有更好的聚类结果。

关键词: 目标函数, 子空间聚类, 离群点, 熵权约束, K-Means聚类算法

Abstract: Most of existing clustering algorithms for high-dimensional sparse data do not consider overlapping class clusters and outliers,resulting in unsatisfactory clustering results.Therefore,this paper proposes an overlapping subspace K-Means clustering algorithm.The computing strategy for class cluster subspace is given.The attribute subspace of each class cluster is dynamically updated in the clustering process,and a reasonable constraint function is defined to guide the clustering process,so as to realize the overlap of clusters and the control of outliers.On this basis,a reasonable objective function is defined to modify the traditional K-Means algorithm,and the weight of each dimension in each class cluster is calculated by using the entropy weight constraint.The value of weight is used to identify the relative importance of the dimensions in different class clusters.And some parameters are added to control the degree of overlap and the number of outliers.Experimental results on artificial data set and real data set show that the proposed algorithm outperforms EWKM,NEO-K-Means,OKM and other subspace clustering algorithms in terms of NMI and F1 indicators with better clustering results.

Key words: objective function, subspace clustering, outlier, entropy weight constraint, K-Means clustering algorithm

中图分类号:

TP18

刘宇航, 马慧芳, 刘海姣, 余丽. 一种可重叠子空间K-Means聚类算法[J]. 计算机工程, 2020, 46(8): 58-63,71.

LIU Yuhang, MA Huifang, LIU Haijiao, YU Li. An Overlapping Subspace K-Means Clustering Algorithm[J]. Computer Engineering, 2020, 46(8): 58-63,71.

https://www.ecice06.com/CN/Y2020/V46/I8/58

图/表 5

20200819134656

20200819134705

20200819134708

20200819134711

20200819134714

参考文献

[1] CHEN Lifei.Research and application of high-dimensional data clustering method[D].Xiamen:Xiamen University,2008.(in Chinese) 陈黎飞.高维数据的聚类方法研究与应用[D].厦门:厦门大学,2008.
[2] XIA Jiazhi,ZHANG Yawei,ZHANG Jian.Local correlation visual analysis based on subspace clustering[J].Journal of Computer-Aided Design and Computer Graphics,2016,28(11):1855-1862.(in Chinese) 夏佳志,张亚伟,张健.一种基于子空间聚类的局部相关性可视分析方法[J].计算机辅助设计与图形学学报,2016,28(11):1855-1862.
[3] HAN R T N J.Efficient and effective clustering methods for spatial data mining[C]//Proceedings of the20thIEEE International Conference on Very Large Data Bases.Washington D.C.,USA:IEEE Press,1994:144-155.
[4] WANG Qian,WANG Cheng,FENG Zhenyuan.Review of K-means clustering algorithm[J].Electronic Design Engineering,2012,20(7):21-24.(in Chinese) 王千,王成,冯振元.K-Means聚类算法研究综述[J].电子设计工程,2012,20(7):21-24.
[5] CLEUZIOU G.An extended version of the K-Means method for overlapping clustering[C]//Proceedings of International Conference on Pattern Recognition.Washington D.C.,USA:IEEE Press,2008:1-4.
[6] WHANG J J,HOU Y,GLEICH D,et al.Non-exhaustive,overlapping clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,40(1):427-436.
[7] AGGARWAL C,PROCOPIUC C,WOLFJ L,et al.Fast algorithms for projected clustering[C]//Proceedings of ACM SIGMOD International Conference on Management of Data.New York,USA:ACM Press,1999:61-72.
[8] CHEN Yuanyuan,LEI Zhang,ZHANG Yi.Subspace clustering using a low-rank constrained autoencoder[J].Information Sciences,2017,424:27-38.
[9] MULLER E,GUNNEMANN S,ASSENT I,et al.Evaluating clustering in subspace projections of high dimensional data[J].Proceedings of the Very Large Data Bases Endowment,2009,2(1):1270-1281.
[10] JING L,NG M K,HUANG J Z.An entropy weighting K-Means algorithm for subspace clustering of high-dimensional sparse data[J].IEEE Transactions on Knowledge and Data Engineering,2007,19(8):1026-1041.
[11] GUNNEMANN S,FARBER I,RAUBACH S,et al.Spectral subspace clustering for graphs with feature vectors[C]//Proceedings of International Conference on Data Mining.[S.1.]:IEEE Computer Society,2013:123-132.
[12] GAN Yanglan.Research on subspace clustering algorithm for high-dimensional data[D].Hefei:Hefei University of Technology,2007.(in Chinese)甘杨兰.面向高维数据的子空间聚类算法研究[D].合肥:合肥工业大学,2007.
[13] LIU Haijiao,MA Huifang,CHANG Yang,et al.Target community detection with user's preference and attribute subspace[J].IEEE Access,2019,7:46583-46594.
[14] HUANG J,NG M,RONG H,et al.Automated variable weighting in K-Means type clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):657-668.
[15] CHAN Y,CHING W,NG M,et al.An optimization algorithm for clustering using weighted dissimilarity measures[J].Pattern Recognition,2004,37(5):943-952.
[16] FRIGUI H,NASRAOUUI O.Unsupervised learning of prototypes and attribute weights[J].Pattern Recognition,2004,37(3):567-581.
[17] ARTHUR D.K-Means++:the advantages of careful seeding[C]//Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms.New York,USA:ACM Press,2007:223-234.
[18] STREHL A,GHOSH J.Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J].Journal of Machine Learning Research,2002,3(3):583-617.
[19] YANG J,LESKOVEC J.Overlapping community detection at scale:a nonnegative matrix factorization approach[C]//Proceedings of the 6th ACM International Conference on Web Search and Data Mining.New York,USA:ACM Press,2013:324-331.
[20] BANERJEE A,KRUMPELMAN C,GHOSH J,et al.Model-based overlapping clustering[C]//Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining.New York,USA:ACM Press,2005:116-128.

选择文件类型/文献管理软件名称

选择包含的内容

一种可重叠子空间K-Means聚类算法

An Overlapping Subspace K-Means Clustering Algorithm

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[2]	胡慧旗, 张维强, 徐晨. 判别性增强的稀疏子空间聚类[J]. 计算机工程, 2023, 49(2): 98-104.
[3]	刘鹏飞, 朱健晨, 万良易, 江波. 低功耗异构计算架构的高光谱遥感图像分类研究[J]. 计算机工程, 2022, 48(12): 9-15,23.
[4]	陶洋, 鲍灵浪, 胡昊. 结构约束的对称低秩表示子空间聚类算法[J]. 计算机工程, 2021, 47(4): 56-61,67.
[5]	孙登第, 凌媛, 丁转莲, 罗斌. 基于稀疏子空间聚类的多层网络社团检测[J]. 计算机工程, 2021, 47(10): 52-60.
[6]	周诗源, 王英林. 基于布谷鸟搜索优化算法的多文档摘要方法[J]. 计算机工程, 2020, 46(7): 58-64,71.
[7]	毛亚琼, 田立勤, 王艳, 毛亚萍, 王志刚. 引入局部向量点积密度的数据流离群点快速检测算法[J]. 计算机工程, 2020, 46(11): 132-138,147.
[8]	解扬,苗付友,白建峰. 基于整数规划的一般访问结构秘密共享方案[J]. 计算机工程, 2019, 45(6): 165-170.
[9]	李道全, 张玉霞, 魏艳婷. 基于聚类分析的能耗均衡无线传感器网络分簇算法[J]. 计算机工程, 2019, 45(10): 116-121.
[10]	谢永华,朱延刚,赵贤国. 基于Zernike矩与BoF-SURF特征融合的花粉图像分类识别[J]. 计算机工程, 2018, 44(7): 259-263,270.
[11]	唐德权,黄金贵,史伟奇. 基于大数据平台的动态车辆路径调度算法[J]. 计算机工程, 2018, 44(1): 74-78.
[12]	曹道通,李敬文,文飞. 图的Smarandachely邻点可区别边染色算法[J]. 计算机工程, 2017, 43(9): 228-233,239.
[13]	俞庆英,罗永龙,陈付龙,郑孝遥. 一种保护私有信息的空间离群点检测方法[J]. 计算机工程, 2017, 43(3): 163-171.
[14]	孙俊涛,张顺利,张利. 基于联合支持向量机的目标跟踪算法[J]. 计算机工程, 2017, 43(3): 266-270.
[15]	聂进焱,魏艳涛,瞿少成. 一种面向局部神经反应的模板选取算法[J]. 计算机工程, 2017, 43(3): 277-281.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

一种可重叠子空间K-Means聚类算法

An Overlapping Subspace K-Means Clustering Algorithm

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献

相关文章 15

编辑推荐

Metrics

本文评价