作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (10): 112-119. doi: 10.19678/j.issn.1000-3428.0066233

• 人工智能与模式识别 • 上一篇    下一篇

基于联合熵的多视图集成聚类分析

赵晓杰, 牛雪莹, 张继福   

  1. 太原科技大学 计算机科学与技术学院, 太原 030024
  • 收稿日期:2022-11-10 出版日期:2023-10-15 发布日期:2023-01-12
  • 作者简介:

    赵晓杰(1997—),男,硕士研究生,主研方向为数据挖掘

    牛雪莹,博士研究生

    张继福,教授、博士

  • 基金资助:
    国家自然科学基金(61876122)

Multi-View Ensemble Clustering Analysis Based on Joint Entropy

Xiaojie ZHAO, Xueying NIU, Jifu ZHANG   

  1. College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Received:2022-11-10 Online:2023-10-15 Published:2023-01-12

摘要:

多视图方法可使问题分析的角度更加全面,并且能有效利用各个视图间的相关信息和互补信息,因而多视图聚类分析已成为机器学习与模式识别等领域的研究热点之一。但在多视图集成聚类分析中,基聚类簇作为基聚类中的一个类簇,包含若干相似数据对象,其疏密程度仅能体现数据自身分布特性,并不能体现基聚类簇质量。利用联合熵来评估基聚类簇的不确定性及质量,提出一种多视图集成聚类分析方法。利用联合熵评估基聚类簇的质量,通过基聚类簇不确定性指数体现基聚类簇的重要程度与质量优劣。利用基聚类簇不确定性指数构造一种加权共协矩阵,提出一种多视图集成聚类算法(MVECJE),改善多视图集成聚类分析的性能。通过实验验证聚类簇权重在多视图集成聚类分析中的重要程度,表明其能改善集成聚类性能。在MSRC-v1、Caltech101-7、Handwritten numerals(HW)图像数据集和Reuters文本数据集上,采用CoregSC、AWGL、MMSC、DIMSC、COMVSC、MVKKM和CW$ {\mathrm{K}}^{2} $M作为对比算法进行实验,结果表明,在NMI、ACC、ARI评价指标上,MVECJE算法具有明显的优势,其中在HW数据集上3个评价指标均高于0.93。

关键词: 多视图集成聚类, 基聚类簇, 权重, 加权共协矩阵, 联合熵

Abstract:

Multi-view clustering analysis has become a research hotspot in machine learning and pattern recognition as a more comprehensive perspective, and the relevant and complementary information between various views are provided. However, in multi-view ensemble clustering analysis, the base clustering cluster, as a cluster in base clustering, contains several similar data objects. Its density can only reflect the distribution characteristics of the data and not the quality of the base cluster. In this study, joint entropy is used to evaluate the uncertainty and quality of the base cluster, and a multi-view ensemble clustering analysis method is proposed. First, the quality of the base clusters is evaluated using joint entropy and expressed using an uncertainty index.Second, a weighted co-occurrence matrix is constructed using the uncertainty index of the base cluster, and a Multi-View Ensemble Clustering algorithm based on Joint Entropy(MVECJE) is established, which effectively improves the performance of the multi-view ensemble clustering analysis. Finally, through experiments, the importance of cluster weights in multi-view ensemble clustering analysis is verified and the performance of ensemble clustering is improved. In addition, on the MSRC-v1, Caltech101-7, and Handwritten numerals(HWs) image datasets, and Reuters text datasets, CoregSC, AWGL, MMSC, DIMSC, COMVSC, MVKKM, and CW$ {\mathrm{K}}^{2} $M are considered as comparison algorithms. From the experimental results, the MVECJE algorithm presents advantages regarding the NMI, ACC, and ARI evaluation indices, the three evaluation indices in the HW dataset are higher than 0.93.

Key words: multi-view ensemble clustering, base cluster, weight, weighted co-occurrence matrix, joint entropy