Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (1): 51-59. doi: 10.19678/j.issn.1000-3428.0059770

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

A Density Clustering Algorithm with Optimized Initial Points and Adaptive Radius

WANG Zhihe, CAO Xuyan, DU Hui   

  1. School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
  • Received:2020-10-20 Revised:2021-01-13 Published:2021-01-13

一种优化初始点与自适应半径的密度聚类算法

王治和, 曹旭琰, 杜辉   

  1. 西北师范大学 计算机科学与工程学院, 兰州 730070
  • 作者简介:王治和(1965-),男,教授,主研方向为数据挖掘;曹旭琰,硕士研究生;杜辉,副教授、博士。
  • 基金资助:
    国家自然科学基金(61962054)。

Abstract: The DBSCAN algorithm cannot accurately cluster the datasets with uneven densities, and the clustering results are greatly affected by the parameters of the neighborhood threshold and density threshold.This paper proposes a new density clustering algorithm for optimizing initial points and adaptive radius.The algorithm uses the Reverse Nearest Neighbor(RNN) and similarity matrix to find the sample point with the largest global density.Through the analysis of the density distribution around the sample, the neighborhood threshold of the current cluster is calculated using an adaptive method and then clustered using the DBSCAN algorithm.The experimental results on artificial datasets and UCI datasets show that compared with the DBSCAN, OPTICS and RNN-DBSCAN algorithms, the proposed algorithm displays the highest score in all five evaluation indexes, including ARI, NMI, Homogeneity, Completeness and V-measure, reaching 1.0 on the Compound dataset and Jain dataset.It can provide high efficiency and accuracy in clustering.

Key words: density clustering, initial point optimization, Reverse Nearest Neighbor(RNN), adaptive radius, similarity matrix

摘要: 传统DBSCAN算法不能正确聚类密度不均匀的数据集,聚类结果受邻域阈值和密度阈值参数的影响较大。提出一种新的优化初始点和自适应半径的密度聚类算法。利用反向最近邻和相似度矩阵发现当前全局密度最大的数据样本,分析该样本周围密度的分布情况,采用自适应的方法计算当前簇的邻域阈值,并利用DBSCAN算法进行聚类。在人工数据集和UCI数据集上进行测试的结果表明,与经典的DBSCAN、OPTICS、RNN-DBSCAN算法相比,优化初始点和自适应半径的密度聚类算法在ARI、NMI、Homogeneity、Completeness和V-measure 5个评价指标上整体取得最优值,其中在Compound、Jain等数据集上达到1.0,具有较高的聚类效率和准确度。

关键词: 密度聚类, 初始点优化, 反向最近邻, 自适应半径, 相似度矩阵

CLC Number: