作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (1): 113-120,129. doi: 10.19678/j.issn.1000-3428.0062723

• 人工智能与模式识别 • 上一篇    下一篇

基于黎曼流形的多视角谱聚类算法

李林珂1, 康昭2, 龙波3   

  1. 1. 电子科技大学 格拉斯哥学院, 成都 611731;
    2. 电子科技大学 计算机科学与工程学院, 成都 611731;
    3. 西南技术物理研究所, 成都 610041
  • 收稿日期:2021-09-17 修回日期:2021-12-18 发布日期:2021-12-21
  • 作者简介:李林珂(2001-),女,本科生,主研方向为机器学习、数据挖掘、自然语言处理;康昭(通信作者),副教授、博士;龙波,研究员。
  • 基金资助:
    国家自然科学基金(62276053);电子科技大学格拉斯哥学院学生科创基金。

Riemannian Manifold Based Multi-View Spectral Clustering Algorithm

LI Linke1, KANG Zhao2, LONG Bo3   

  1. 1. Glasgow College, University of Electronic Science and Technology of China, Chengdu 611731, China;
    2. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China;
    3. Southwest Institute of Technical Physics, Chengdu 610041, China
  • Received:2021-09-17 Revised:2021-12-18 Published:2021-12-21

摘要: 现有的多视角谱聚类算法大多只线性结合了各视角的基拉普拉斯矩阵,未考虑不同视角数据的差异性对最优拉普拉斯矩阵的影响,存在聚类性能受限的问题。提出一种基于黎曼几何均值与高阶拉普拉斯矩阵的谱聚类算法(RMMSC),挖掘多视角数据中的高阶连接信息与流形信息,提高最优拉普拉斯矩阵对各视角的信息利用率。按一定的权重线性结合数据单一视角的各阶拉普拉斯矩阵,得到每个视角的基拉普拉斯矩阵,通过低阶与高阶连接信息的结合使用,充分体现多视角数据集的全局结构。在此基础上,计算各视角基拉普拉斯矩阵的黎曼几何均值,将其作为最优拉普拉斯矩阵输入谱聚类算法,得到聚类结果。相比于传统矩阵算数均值的计算,基于黎曼流形的黎曼几何均值能够更好地恢复互补层数据的流形信息。实验结果表明,RMMSC在多组标准数据集上聚类效果优于ONMSC、MLAN、AMGL等算法。其中,在Flower17数据集上,精确度较基准算法ONMSC提高了2.14%,纯度提高了1.7%,且收敛性较好。

关键词: 多视角谱聚类, 黎曼几何均值, 高阶拉普拉斯矩阵, 对称正定矩阵, 流形学习

Abstract: Most multi-view spectral clustering algorithms only linearly aggregate the base Laplacian matrices of different views, neglecting the discrepancy of data from each view;this limits the clustering performance.A novel Riemannian Manifold based Multi-view Spectral Clustering algorithm(RMMSC) is proposed to address this problem. The manifold information and high-order connection information of different views are exploited effectively, and the information utilization of each view is improved.For each view, the Laplacian matrices of different orders are merged with certain weights to obtain the base Laplacian matrix.The integration of the connection information of different orders sufficiently represents the global structure of the multi-view datasets.On this basis, the Riemannian geometric mean of the base Laplacian matrices of different views is calculated, which is considered as the optimal Laplacian matrix for spectral clustering and obtains the clustering result.A Riemannian manifold based geometric mean recovers the manifold information of complementary layers more comprehensively compared with the arithmetic mean.Experiments on several benchmark datasets show that in contrast with ONMSC, MLAN, AMGL and other algorithms, the proposed algorithm produces superior clustering results and faster convergence.For example, compared with the baseline model ONMSC, the accuracy and purity of RMMSC are improved by 2.14% and 1.7%, respectively, on the Flower17 dataset.

Key words: multi-view spectral clustering, Riemannian geometric mean, high-order Laplacian matrix, Symmetric Positive Definite(SPD) martix, manifold learning

中图分类号: