作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于局部和全局信息的改进聚类算法

许小龙1,王士同1,梅向东2   

  1. (1. 江南大学数字媒体学院,江苏无锡214122; 2. 赞奇科技发展有限公司,江苏常州213000)
  • 出版日期:2015-06-15 发布日期:2015-06-12
  • 作者简介:许小龙(1989 - ),男,硕士研究生,主研方向:人工智能,模式识别;王士同,教授;梅向东,高级工程师。
  • 基金资助:

    江苏省自然科学基金资助项目(BK2011417)。

Improved Clustering Algorithm Based on Local and Global Information

XU Xiaolong 1,WANG Shitong 1,MEI Xiangdong 2   

  1. (1. School of Digital Media,Jiangnan University,Wuxi 214122,China; 2. Zanqi Sci-Tech Development Co. ,Ltd. ,Changzhou 213000,China
  • Online:2015-06-15 Published:2015-06-12

摘要:

传统K-means 算法在随机选取初始聚类中心时,容易导致结果不稳定,谱聚类算法直接在相似矩阵上进行分割,对结果的准确性影响较大,而局部和全局正则化聚类算法未考虑数据空间分布对结果的影响。为此,引入离散度矩阵对局部和全局正则化聚类算法进行改进。改进算法考虑数据的分布信息,通过在局部信息目标函数中引入离散度矩阵,结合全局信息的目标函数,将目标函数最小化问题转换为分解稀疏矩阵特征的问题。在UCI 机器学习数据集和公共数据挖掘数据集上的实验结果表明,与K-means 及标准谱聚类算法相比,该算法的预测精度更高。

关键词: K-means 算法, 谱聚类, 离散度矩阵, 特征分解, UCI 数据集

Abstract:

Traditional K-means clustering algorithm is sensitive to the initialization. Spectral clustering operates on the similar matrix,and severely affects the cluster result. Clustering with local and global regularization does not take the distribution of data set into consideration. To solve this problem,this paper introduces the dispersion matrix to improve the clustering on the base of local and global regularization. The proposed algorithm takes the distribution of data set into consideration which combines the local information and dispersion matrix. The global optimal information is considered, and then it gets the final optimization problem which can be solved by the eigenvalue decomposition of a spare symmetric matrix. Several mentioned algorithms are tested on UCI machine learning data sets and public data mining data sets. Experimental results and comparison results show the greater performance of the proposed algorithm.

Key words: K-means algorithm, spectral clustering, dispersion matrix, characteristic decomposition, UCI data set

中图分类号: