作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2015, Vol. 41 ›› Issue (1): 207-210. doi: 10.3969/j.issn.1000-3428.2015.01.038

• 人工智能及识别技术 • 上一篇    下一篇

基于距离度量学习的集成谱聚类

牛科,张小琴,贾郭军   

  1. 山西师范大学数学与计算机科学学院,山西 临汾 041004
  • 收稿日期:2013-10-31 修回日期:2014-03-03 出版日期:2015-01-15 发布日期:2015-01-16
  • 作者简介:牛 科(1987-),男,硕士研究生,主研方向:智能计算,软件工程;张小琴,硕士研究生;贾郭军,副教授。
  • 基金资助:
    山西省软科学基金资助项目(2009041052-03)

Integrated Spectral Clustering Based on Distance Metric Learning

NIU Ke,ZHANG Xiaoqin,JIA Guojun   

  1. School of Mathematics and Computer Science,Shanxi Normal University,Linfen 041004,China
  • Received:2013-10-31 Revised:2014-03-03 Online:2015-01-15 Published:2015-01-16

摘要: 无监督学习聚类算法的性能依赖于用户在输入数据集上指定的距离度量,该距离度量直接影响数据样本之间的相似性计算,因此,不同的距离度量往往对数据集的聚类结果具有重要的影响。针对谱聚类算法中距离度量的选取问题,提出一种基于边信息距离度量学习的谱聚类算法。该算法利用数据集本身蕴涵的边信息,即在数据集中抽样产生的若干数据样本之间是否具有相似性的信息,进行距离度量学习,将学习所得的距离度量准则应用于谱聚类算法的相似度计算函数,并据此构造相似度矩阵。通过在UCI标准数据集上的实验进行分析,结果表明,与标准谱聚类算法相比,该算法的预测精度得到明显提高。

关键词: 数据挖掘, 边信息, 相似度矩阵, 距离度量学习, 谱聚类, UCI数据集

Abstract: The performance of the unsupervised learning clustering algorithm is critically dependent on the distance metric being given by a user over the inputs of the data set.The calculation of the similarity between the data samples lies on the specified metric,therefore,the distance metric has a significant influence to the results of the clustering algorithm.Aiming at the problem of the selection of the distance metric for the spectral clustering algorithm,a spectral clustering algorithm based on distance metric learning with side-information is presented.The algorithm learns a distance metric with the side-information.The similarity between the data samples is chosen randomly from the data set,and is applied to the similarity function of spectral clustering algorithm.It structures the similarity matrix of the algorithm.The effectiveness of the algorithm is verified on real standard data sets on UCI,and experimental results show that compared with the standard spectral clustering algorithms,the prediction accuracy of the proposed algorithm is improved significantly.

Key words: data mining, side-information, similarity matrix, distance metric learning, spectral clustering, UCI data set

中图分类号: