Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (8): 116-123. doi: 10.19678/j.issn.1000-3428.0058893

• Advanced Computing and Data Processing • Previous Articles     Next Articles

Spectral Clustering Algorithm for Density Adaptive Neighborhood Based on Shared Nearest Neighbors

GE Junwei, YANG Guangxin   

  1. College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2020-07-09 Revised:2020-09-02 Published:2020-08-11

基于共享最近邻的密度自适应邻域谱聚类算法

葛君伟, 杨广欣   

  1. 重庆邮电大学 计算机科学与技术学院, 重庆 400065
  • 作者简介:葛君伟(1961-),男,教授、博士,主研方向为大数据处理;杨广欣,硕士研究生。
  • 基金资助:
    重庆市重点产业共性关键技术创新重大主题专项(cstc2017zdcy-zdzx0046);重庆市基础与前沿研究计划项目(cstc2017jcyjA0755)。

Abstract: Without prior information, it is difficult for spectral clustering algorithms to build appropriate similarity graphs for datasets with complex shapes and different densities. At the same time, the similarity measure of Gaussian kernel functions based on Euclidean distance ignores global consistency. To address the problem, a spectral clustering algorithm (SC-DANSN) for density adaptive neighborhood based on shared nearest neighbors is proposed. An undirected graph is constructed by using a parameter-free density adaptive neighborhood construction method, and shared nearest neighbors are used to measure the similarity between samples. This measurement eliminates the influence of parameters on similarity graph construction, as it reflects both global consistency and local consistency. The experimental results show that the SC-DANSN algorithm has a higher clustering accuracy than the K-means algorithm and Spectral Clustering based on K Nearest Neighbor (SC-KNN). At the same time, SC-DANSN is less sensitive to the selection of parameters than SC-KNN.

Key words: Spectral Clustering(SC), similarity matrix, density adaptive neighborhood, shared nearest neighbor, K Nearest Neighbor(KNN)

摘要: 在谱聚类算法没有先验信息的情况下,对于具有复杂形状和不同密度变化的数据集很难构建合适的相似图,且基于欧氏距离的高斯核函数的相似性度量忽略了全局一致性。针对该问题,提出一种基于共享最近邻的密度自适应邻域谱聚类算法(SC-DANSN)。通过一种无参数的密度自适应邻域构建方法构建无向图,将共享最近邻作为衡量样本之间的相似性度量进而消除参数对构建相似图的影响,体现全局和局部的一致性。实验结果表明,SC-DANSN算法相比K-means算法和基于K最近邻的谱聚类算法(SC-KNN)具有更高的聚类精度,同时相比SC-KNN算法对参数的选取敏感性更低。

关键词: 谱聚类, 相似性矩阵, 密度自适应邻域, 共享最近邻, K最近邻

CLC Number: