作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (6): 53-61. doi: 10.19678/j.issn.1000-3428.0064368

• 人工智能与模式识别 • 上一篇    下一篇

基于相对密度的密度峰值聚类算法

位雅1, 张正军1, 何凯琳1, 唐莉2   

  1. 1. 南京理工大学 数学与统计学院, 南京 210094;
    2. 景德镇学院 信息工程学院, 江西 景德镇 333000
  • 收稿日期:2022-04-02 修回日期:2022-07-19 发布日期:2023-06-10
  • 作者简介:位雅(1997-),女,硕士研究生,主研方向为数据挖掘;张正军,副教授、博士;何凯琳,硕士研究生;唐莉,助教、硕士。
  • 基金资助:
    国家自然科学基金(61773014)。

Density Peak Clustering Algorithm Based on Relative Density

WEI Ya1, ZHANG Zhengjun1, HE Kailin1, TANG Li2   

  1. 1. School of Mathematics and Statistics, Nanjing University of Science and Technology, Nanjing 210094, China;
    2. School of Information Engineering, Jingdezhen University, Jingdezhen 333000, Jiangxi, China
  • Received:2022-04-02 Revised:2022-07-19 Published:2023-06-10

摘要: 密度峰值聚类算法在处理密度不均匀的数据集时易将低密度簇划分到高密度簇中或将高密度簇分为多个子簇,且在样本点分配过程中存在误差传递问题。提出一种基于相对密度的密度峰值聚类算法。引入自然最近邻域内的样本点信息,给出新的局部密度计算方法并计算相对密度。在绘制决策图确定聚类中心后,基于对簇间密度差异的考虑,提出密度因子计算各个簇的聚类距离,根据聚类距离对剩余样本点进行划分,实现不同形状、不同密度数据集的聚类。在合成数据集和真实数据集上进行实验,结果表明,该算法的FMI、ARI和NMI指标较经典的密度峰值聚类算法和其他3种聚类算法分别平均提高约14、26和21个百分点,并且在簇间密度相差较大的数据集上能够准确识别聚类中心和分配剩余的样本点。

关键词: 聚类, 密度峰值, 相对密度, 密度因子, 聚类距离, 自然最近邻

Abstract: When the density peak clustering algorithm deals with datasets with uneven density,it is easy to divide the low-density clusters into high-density clusters,divide the high-density clusters into multiple sub-clusters,and exists the error propagation occurs in the process of sample point allocation.To solve these problems,a density peak clustering algorithm based on relative density is proposed.The proposed algorithm introduces sample point information in the natural nearest neighborhood,provides a new local density calculation method,and calculates the relative density. After drawing a decision diagram to determine the cluster centers,considering the density difference between clusters,a density factor is proposed to calculate the clustering distance of each cluster,and the remaining sample points are divided according to the clustering distance,then the proposed algorithm clusters datasets with different shapes and densities.Experiments are performed on synthetic and real datasets for comparison with the classical density peak clustering algorithm and three other clustering algorithms.The results show that the proposed algorithm increases the Fowlkes and Mallows Index(FMI),Adjusted Rand Index(ARI),and Normalized Mutual Information(NMI) by an average of approximately 14,26,and 21 percentage points,respectively. At the same time,the proposed algorithm has great advantages in accurately identifying cluster centers and assigning the remaining sample points to datasets with large differences in densities between clusters.

Key words: clustering, density peak, relative density, density factor, clustering distance, natural nearest neighborhood

中图分类号: