基于局部和全局信息的改进聚类算法

doi:10.3969/j.issn.1000-3428.2015.06.030

摘要/Abstract

摘要：

传统K-means 算法在随机选取初始聚类中心时,容易导致结果不稳定,谱聚类算法直接在相似矩阵上进行分割,对结果的准确性影响较大,而局部和全局正则化聚类算法未考虑数据空间分布对结果的影响。为此,引入离散度矩阵对局部和全局正则化聚类算法进行改进。改进算法考虑数据的分布信息,通过在局部信息目标函数中引入离散度矩阵,结合全局信息的目标函数,将目标函数最小化问题转换为分解稀疏矩阵特征的问题。在UCI 机器学习数据集和公共数据挖掘数据集上的实验结果表明,与K-means 及标准谱聚类算法相比,该算法的预测精度更高。

关键词: K-means 算法, 谱聚类, 离散度矩阵, 特征分解, UCI 数据集

Abstract:

Traditional K-means clustering algorithm is sensitive to the initialization. Spectral clustering operates on the similar matrix,and severely affects the cluster result. Clustering with local and global regularization does not take the distribution of data set into consideration. To solve this problem,this paper introduces the dispersion matrix to improve the clustering on the base of local and global regularization. The proposed algorithm takes the distribution of data set into consideration which combines the local information and dispersion matrix. The global optimal information is considered, and then it gets the final optimization problem which can be solved by the eigenvalue decomposition of a spare symmetric matrix. Several mentioned algorithms are tested on UCI machine learning data sets and public data mining data sets. Experimental results and comparison results show the greater performance of the proposed algorithm.

Key words: K-means algorithm, spectral clustering, dispersion matrix, characteristic decomposition, UCI data set

中图分类号:

TP181

许小龙,王士同,梅向东. 基于局部和全局信息的改进聚类算法[J]. 计算机工程.

XU Xiaolong,WANG Shitong,MEI Xiangdong. Improved Clustering Algorithm Based on Local and Global Information[J]. Computer Engineering.

https://www.ecice06.com/CN/Y2015/V41/I6/165

参考文献

参考文献 [ 1 ]　孙吉贵,刘　杰,赵连宇. 聚类算法研究[J]. 软件学报,2008,19(1):48-61. [ 2 ]　Han Jiawei, Kamber M. Data Mining: Concepts and Techniques[M]. San Francisco,USA:Morgan Kaufmann Publishers,2001. [ 3 ]　Duda R O,Hart P E,Stork D G. Pattern Classication[M]. New York,USA:John Wiley & Sons Inc. ,2001. [ 4 ]　He Ji,Lan Man, Tan Chew-Lim, et al. Initialization of Cluster Refinement Algorithms:A Review and Comparative Study[C]/ / Proceedings of International Joint Conference on Neural Networks. Washington D. C. , USA: IEEE Computer Society,2004:297-302. [ 5 ]　Zha Hongyuan,He Xiaofeng, Chris D, et al. Spectral Relaxation for K-means Clustering[C] / / Proceedings of Advances in Neural Information Processing Systems. Cambridge,USA:MIT Press,2002:1057-1064. [ 6 ]　Shi Jianbo, Malik J. Normalized Cuts and Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905. [ 7 ]　Wang Fei, Zhang Changshui, Li Tao. Clustering with Local and Global Regularization[J]. IEEE Transactions on Knowledge and Data Engineering,2009,21 (12): 1665-1678. [ 8 ]　Belkin M,Niyogi P. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation[J]. Neural Computation,2003,15(6):1373-1396. [ 9 ]　Zhou Dengyong,Bousquet O,Lal T N,et al. Learning with Local and Global Consistency[C] / / Proceedings of Advances in Neural Information Processing Systems. Cambridge,USA:MIT Press,2003:321-328. [10]　Roweis S,Saul L. Nonlinear Dimensionality Reduction by Locally Linear Embedding [ J ]. Science, 2000, 290(5500):2323-2326. [11]　Stella Y X, Shi Jianbo. Multiclass Spectral Clustering[ C]/ / Proceedings of International Conference on Computer Vision. Piscataway, USA: IEEE Press, 2003: 313-319. [12]　Vapnik V. The Nature of Statistical Learning Theory[M]. Berlin,Germany:Springer-Verlag,1995. [13]　Bottou L,Vapnik V. Local Learning Algorithms [ J ]. Neural Computation,1992,4(6):888-900. [14]　Wu Mingrui,Scholkopf B. A Local Learning Approach for Clustering [ C ] / / Proceedings of NIPS ’ 06. Washington D. C. ,USA:MIT Press,2006:1529-1536. [15]　Golub G H,Vanloan C F. Matrix Computations [M]. Baltimore,USA:Johns Hopkins University Press,1996. [16]　Belkin M, Niyogi P. Semi-supervised Learning on Riemannian Manifolds [ J ]. Machine Learning, 2004, 56(13):209-239. [17]　Hein M, Audibert J Y, Luxburg U. From Graphs to Manifolds-weak and Strong Pointwise Consistency of Graph Laplacians [ C ] / / Proceedings of the 18th Conference on Learning Theory. Berlin, Germany: Springer-Verlag,2005:470-485. [18]　Zafeiriou S,Tefas A,Pitas I. Minimum Class Variance Support Vector Machines [ J ]. IEEE Transactions on Image Processing,2007,16(10):2551-2564. [19]　Ng A Y,Jordan M I,Weiss Y. On Spectral Clustering Analysis and an Algorithm [ C ] / / Proceedings of Advances in Neural Information Processing Systems. Cambridge,USA:MIT Press,2001:849-856. [20]　Dhillon I S,Guan Yuqiang,Kulis B. Kernel K-means: Spectral Clustering and Normalized Cuts [ C ] / / Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York,USA:ACM Press,2004:551-556. [21]　Deng Zhaohong, Choi K S, Chung Fu-Lai. Enhanced Soft Subspace Clustering Integrating Within-cluster and Between-cluster Information [ J ]. Pattern Recognition, 2010,43(3):767-781. [22]　Jing Liping, Ng M K, Huang Zhexue. An Entropy Weighting K-means Algorithm for Subspace Clustering of High-dimensional Sparse Data[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19 (8): 1026-1041. 编辑　刘　冰

[1]	胡傲然, 陈晓红. 基于多样性与一致性的单步多视图聚类[J]. 计算机工程, 2024, 50(5): 51-61.
[2]	马越, 温蜜. 基于多尺度LDTW和TCN的空间负荷预测方法[J]. 计算机工程, 2024, 50(3): 106-113.
[3]	王丽娟, 邢津萍, 尹明, 郝志峰, 蔡瑞初, 温雯. 基于一致性图的权重自适应多视角谱聚类算法[J]. 计算机工程, 2024, 50(2): 122-131.
[4]	刘思慧, 高全学, 宋伟, 谢德燕. 基于加权张量低秩约束的多视图谱聚类[J]. 计算机工程, 2024, 50(1): 129-137.
[5]	胡慧旗, 张维强, 徐晨. 判别性增强的稀疏子空间聚类[J]. 计算机工程, 2023, 49(2): 98-104.
[6]	李林珂, 康昭, 龙波. 基于黎曼流形的多视角谱聚类算法[J]. 计算机工程, 2023, 49(1): 113-120,129.
[7]	王丽娟, 张霖, 尹明, 郝志峰, 蔡瑞初, 温雯. 基于正交基的多视图迁移谱聚类[J]. 计算机工程, 2022, 48(10): 37-44,54.
[8]	葛君伟, 杨广欣. 基于共享最近邻的密度自适应邻域谱聚类算法[J]. 计算机工程, 2021, 47(8): 116-123.
[9]	陆贝妮,杜育根. 基于社区发现的Web服务QoS预测[J]. 计算机工程, 2019, 45(3): 117-124.
[10]	付饶,孟凡荣,邢艳. 基于节点重要性与相似性的重叠社区发现算法[J]. 计算机工程, 2018, 44(9): 192-198.
[11]	陆杰,张震,胡涛. 基于可靠性与负载优化的多控制器弹性部署算法[J]. 计算机工程, 2018, 44(8): 135-141.
[12]	田明浩,刘仲康,冯永新,钱博. 一种基于特征值的3D MMSE角度域波束赋形算法[J]. 计算机工程, 2018, 44(6): 50-56.
[13]	郑乐乐,韩慧妍,韩燮. 基于显著性与弱凸性的三维点云模型分割[J]. 计算机工程, 2018, 44(4): 299-304.
[14]	范子静,罗泽,马永征. 一种基于模糊核聚类的谱聚类算法[J]. 计算机工程, 2017, 43(11): 161-165,172.
[15]	周静,杨凡,史凌祎,郑忠龙. 基于协同表征的二部图矿石图像分割[J]. 计算机工程, 2016, 42(10): 236-241.

选择文件类型/文献管理软件名称

选择包含的内容