作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (12): 95-103. doi: 10.19678/j.issn.1000-3428.0063279

• 人工智能与模式识别 • 上一篇    下一篇

基于特征选择与鲁棒图学习的多视图聚类

黄奕轩1, 杜世强1,2, 余瑶1, 肖庆江2, 宋金梅2   

  1. 1. 西北民族大学 数学与计算机科学学院, 兰州 730030;
    2. 西北民族大学 中国民族信息技术研究院, 兰州 730030
  • 收稿日期:2021-11-18 修回日期:2022-01-10 发布日期:2022-01-14
  • 作者简介:黄奕轩(1996—),男,硕士研究生,主研方向为机器学习;杜世强,副教授、博士;余瑶、肖庆江、宋金梅,硕士研究生。
  • 基金资助:
    国家自然科学基金(61866033);西北民族大学引进人才科研项目(xbmuyjrc201904)。

Multi-View Clustering Based on Feature Selection and Robust Graph Learning

HUANG Yixuan1, DU Shiqiang1,2, YU Yao1, XIAO Qingjiang2, SONG Jinmei2   

  1. 1. School of Mathematics and Computer Science, Northwest Minzu University, Lanzhou 730030, China;
    2. School of China National Institute of Information Technology, Northwest Minzu University, Lanzhou 730030, China
  • Received:2021-11-18 Revised:2022-01-10 Published:2022-01-14

摘要: 现有的多视图聚类方法大多直接在原始数据样本上构建各视图的相似图,而原始数据中的冗余特征和噪声会导致聚类精度下降。针对该问题,基于特征选择和鲁棒图学习提出多视图聚类算法FRMC。在自适应选择不同视图特征时降低数据维度,减少冗余特征,同时利用自表示学习获取数据的表示系数,滤除噪声影响并得到数据样本的全局结构,从而去除样本中的噪声和离群点。在此基础上,通过自适应近邻学习构造样本鲁棒图,利用鲁棒图矩阵的加权和构建最终的亲和图矩阵,提出一种基于增广拉格朗日乘子的交替迭代算法对目标函数进行优化。在6个不同类型的标准数据集上进行实验,与SC、RGC、AWP等算法的对比结果表明,FRMC算法能够有效提升聚类精度且具有较好的收敛性与鲁棒性。

关键词: 多视图聚类, 特征选择, 自表示学习, 自适应近邻学习, 亲和图矩阵

Abstract: Most existing multi-view clustering methods directly construct the similarity graph of each view on the original data samples, and redundant features and noise in the original data often cause clustering accuracy to decline. Therefore, a multi-view clustering algorithm based on feature selection and robust graph learning is proposed, named FRMC.When different view features are adaptively selected, data dimensions are reduced to reduce redundant features. Meanwhile, self-representation learning is used to obtain the data representation coefficients, and the global structure of the data samples can be obtained while filtering out the influence of noise.On this basis, adaptive nearest neighbor learning is used to generate the sample robust graphs, and the weighted sum of the robust graphs is used to generate the final affinity graph matrix.Finally, to optimize the objective function, an alternating iterative algorithm based on an augmented Lagrange multiplier is proposed.Four evaluation indicators are used, and experiments on six different types of standard data sets are performed.The experimental results on six different types of standard datasets show that, compared with SC, RGC, AMP etc., FRMC can significantly improve clustering accuracy.Meanwhile, this algorithm has better convergence and robustness.

Key words: multi-view clustering, feature selection, self-representation learning, adaptive nearest neighbor learning, affinity graph matrix

中图分类号: