作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (8): 77-84,97. doi: 10.19678/j.issn.1000-3428.0062580

• 人工智能与模式识别 • 上一篇    下一篇

融合自动权重学习的深度子空间聚类

江雨燕1, 邵金1, 李平2   

  1. 1. 安徽工业大学 管理科学与工程学院, 安徽马鞍山 243032;
    2. 南京邮电大学 计算机学院, 南京 210023
  • 收稿日期:2021-09-03 修回日期:2021-10-11 发布日期:2022-08-09
  • 作者简介:江雨燕(1966-),女,教授,主研方向为机器学习、智能计算;邵金,硕士研究生;李平,博士。
  • 基金资助:
    国家自然科学基金(62006126);安徽普通高校重点实验室开放基金项目(CS2019-ZD02)。

Deep Subspace Clustering Fused with Auto-Weight Learning

JIANG Yuyan1, SHAO Jin1, LI Ping2   

  1. 1. School of Management Science and Engineering, Anhui University of Technology, Maanshan, Anhui 243032, China;
    2. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2021-09-03 Revised:2021-10-11 Published:2022-08-09

摘要: 子空间聚类算法是一种面向高维数据的聚类方法,具有独特的数据自表示方式和较高的聚类精度。传统子空间聚类算法聚焦于对输入数据构建最优相似图再进行分割,导致聚类效果高度依赖于相似图学习。自适应近邻聚类(CAN)算法改进了相似图学习过程,根据数据间的距离自适应地分配最优邻居以构建相似图和聚类结构。然而,现有CAN算法在进行高维数据非线性聚类时,难以很好地捕获局部数据结构,从而导致聚类准确性及算法泛化能力有限。提出一种融合自动权重学习与结构化信息的深度子空间聚类算法。通过自编码器将数据映射到非线性潜在空间并降维,自适应地赋予潜在特征不同的权重从而处理噪声特征,最小化自编码器的重构误差以保留数据的局部结构信息。通过CAN方法学习相似图,在潜在表示下迭代地增强各特征间的相关性,从而保留数据的全局结构信息。实验结果表明,在ORL、COIL-20、UMIST数据集上该算法的准确率分别达到0.780 1、0.874 3、0.742 1,聚类性能优于LRR、LRSC、SSC、KSSC等算法。

关键词: 聚类, 自编码器, 自适应近邻聚类, 结构化信息, 特征权重

Abstract: Subspace clustering is a clustering method for high-dimensional data.This method offers a unique way of data self-representation and high clustering accuracy.The limitation of traditional subspace clustering is its focus on constructing the optimal similarity graph by input data prior to segmentation, which causes clustering performance to highly depend on similarity graph learning.Clustering with Adaptive Neighbors(CAN) improves the process of similarity graph learning by adaptively assigning the optimal neighbors for each data to learn the similarity graph and clustering structure.However, CAN methods have poor performance in high-dimensional nonlinear clustering, and cannot sufficiently capture local data structures, which limits their clustering accuracy and generalizability.To mitigate these issues, this paper proposes deep subspace clustering by fusing with auto-weight learning and structured information.An autoencoder is employed to map data to nonlinear latent space and reduce dimensionality, and latent features are adaptively assigned weights to handle noisy features, thus minimizing the autoencoder reconstruction error to preserve local structure information.Learning similar graphs using CAN iteratively enhances the correlation between features under latent representation to preserve the global structure information.Experimental results show that the accuracy of the proposed algorithm reached 0.780 1, 0.874 3, and 0.742 1, on the ORL, COIL-20, and UMIST datasets, respectively.These results reflect performance superior to other algorithms including LRR, LRSC, SSC, and KSSC.

Key words: clustering, autoencoder, Clustering with Adaptive Neighbors(CAN), structured information, feature weight

中图分类号: