Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (8): 141-150. doi: 10.19678/j.issn.1000-3428.0069575

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Self-Weighted Multi-View k-means Algorithm

LIN Hechuan1, XU Huiying1,*(), ZHU Xinzhong1, HUANG Xiao2, LIU Ziyang1   

  1. 1. College of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, Zhejiang, China
    2. College of Education, Zhejiang Normal University, Jinhua 321004, Zhejiang, China
  • Received:2024-03-15 Revised:2024-04-10 Online:2025-08-15 Published:2024-05-07
  • Contact: XU Huiying

自加权多视图k-均值算法

林合川1, 徐慧英1,*(), 朱信忠1, 黄晓2, 刘子洋1   

  1. 1. 浙江师范大学计算机科学与技术学院, 浙江 金华 321004
    2. 浙江师范大学教育学院, 浙江 金华 321004
  • 通讯作者: 徐慧英
  • 基金资助:
    国家自然科学基金(62376252); 国家自然科学基金(61976196); 浙江省自然科学基金重点项目(LZ22F030003); 国家级大学生创新创业训练计划项目创新训练重点项目(202310345042)

Abstract:

With advancements in information technology, people can use increasingly diversified and complex ways to describe things more accurately, which has led to the emergence of multi-view data. Clustering multi-view data is a fundamental topic in data mining, machine learning, pattern recognition, and other fields. In the current era of information explosion, data dimensionality is increasing significantly, and the efficient clustering of such data remains a significant challenge. The current multi-view k-means algorithms faces the ″shortage of ability″ problem when dealing with high-dimensional data. To address this issue, this paper proposes a new multi-view clustering framework, namely the Self-weighted Multi-view k-Means (SwMKM) algorithm. First, by adopting the least absolute principles to guide robustness, this algorithm successfully reduces the effects of outliers on the results. Subsequently, the Iterative Reweight Least Square (IRLS) method is used to solve the minimum absolute residual, and the distribution of multiple weights is adjusted adaptively to achieve reweighting control. Finally, by introducing a projection matrix with $\ell$2, 1 -norm penalty term, the high-dimensional feature space of the original dataset is transformed into a statistically uncorrelated, low-dimensional subspace for feature selection and noise suppression. Experimental results show that the proposed algorithm performs significantly better than other multi-view k-means algorithms on Handwritten numerals, MSRCv1, Outdoor Scene, and other datasets.

Key words: unsupervised learning, k-means, multi-view clustering, ?2, 1-norm, self-weighting

摘要:

随着信息技术的不断进步, 人们能够运用越来越多样化和复杂的方式来更准确地描述事物, 这导致了多视图数据的出现。对多视图数据聚类是数据挖掘、机器学习、模式识别等领域的基础和重要课题。在这个信息爆炸的时代, 数据的维度越来越高, 如何有效地对这类数据进行聚类仍然是一项巨大的挑战。针对目前多视图k-均值算法在处理高维数据时能力不足的问题, 提出一种全新的多视图聚类框架——自加权多视图k-均值(SwMKM)算法。首先, 通过采用最小绝对准则来引导鲁棒性, 降低异常值对结果的影响; 然后, 采用迭代重加权最小二乘法(IRLS)来求解最小绝对残差, 通过自适应地调整多个权重的分布, 实现重加权的控制; 最后, 通过引入具有$\ell$2, 1范数惩罚项的投影矩阵, 将原始数据集的高维特征空间转换为统计上不相关的低维的子空间, 实现特征选择和噪声抑制。实验结果显示, SwMKM算法在Handwritten numerals、MSRCv1、Outdoor Scene等数据集上的表现明显优于其他多视图k-均值算法, 证明了该算法聚类的优越性。

关键词: 无监督学习, k-均值, 多视图聚类, ?2, 1范数, 自加权