作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 96-103, 110. doi: 10.19678/j.issn.1000-3428.0065405

• 人工智能与模式识别 • 上一篇    下一篇

数据增强和自适应自步学习的深度子空间聚类算法

江雨燕1, 陶承凤1, 李平2   

  1. 1. 安徽工业大学 管理科学与工程学院, 安徽 马鞍山 243032
    2. 南京邮电大学 计算机学院, 南京 210023
  • 收稿日期:2022-08-01 出版日期:2023-08-15 发布日期:2022-10-24
  • 作者简介:

    江雨燕(1966—),女,教授,主研方向为机器学习、智能计算

    陶承凤,硕士研究生

    李平,博士

  • 基金资助:
    国家自然科学基金(62006126)

Deep Subspace Clustering Algorithm with Data Augmentation and Adaptive Self-Paced Learning

Yuyan JIANG1, Chengfeng TAO1, Ping LI2   

  1. 1. School of Management Science and Engineering, Anhui University of Technology, Maanshan 243032, Anhui, China
    2. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2022-08-01 Online:2023-08-15 Published:2022-10-24

摘要:

深度子空间聚类通过联合执行自表达特征学习和聚类分配而取得了比传统聚类更好的性能。尽管在各种应用中出现了大量的深度子空间聚类算法,但是多数算法都无法学习到精准的面向聚类的特征。针对深度子空间聚类方法在学习聚类的特征表示时不够精准、影响最终聚类性能等问题,提出一种改进的深度子空间聚类算法。通过随机移位和旋转对原样本进行数据增强,交替地使用增强样本来训练和优化自编码器,同时更新样本的集群分配,从而学习到更稳健的特征表示。在微调阶段,损失函数中每个增强样本的目标都是将原样本分配到集群中心,目标计算可能出错,目标错误的样本会误导自编码器网络训练,为此,利用一种无需额外超参数的自适应自步学习算法,在每次迭代中选择最具说服力的样本来提高泛化能力。在MNIST、USPS、COIL100数据集上进行实验,结果表明,该算法的准确率分别达到0.931 8、0.893 4、0.723 6,消融实验和敏感性分析结果也验证了算法的有效性。

关键词: 深度学习, 子空间聚类, 数据增强, 自适应自步学习, 编码器

Abstract:

Deep subspace clustering achieves better performance than traditional clustering by jointly performing self-expressed feature learning and cluster allocation.Despite the emergence of a large number of deep subspace clustering algorithms in various applications, most algorithms are unable to learn accurate clustering-oriented features.In this study, an improved deep subspace clustering algorithm is proposed to address issues such as insufficient accuracy in learning the feature representation of clustering, which affects the final performance of deep subspace clustering methods. Random displacement and rotation are used to enhance the original sample data, whereby the autoencoder is trained and optimized by alternately using enhanced samples while updating the cluster allocation of samples to learn more robust feature representations.In the fine-tuning phase, the goal is for each enhanced sample in the loss function, to allocate the original sample to the cluster center. The target calculation may be wrong, and the sample with the wrong target will mislead the self-encoder network training.Therefore, an adaptive self-paced learning algorithm without additional hyperparameters is used to select the most convincing sample in each iteration to improve generalization ability. Experiments were conducted on the MNIST, USPS, and COIL100 datasets, and the results showed that the accuracy of the algorithm reached 0.931 8, 0.893 4, and 0.723 6, respectively.The ablation experiment and sensitivity analysis results also verified the effectiveness of the algorithm.

Key words: deep learning, subspace clustering, data augmentation, adaptive self-paced learning, encoder