• 人工智能与模式识别 •

### 基于相对密度的密度峰值聚类算法

1. 1. 南京理工大学 数学与统计学院, 南京 210094;
2. 景德镇学院 信息工程学院, 江西 景德镇 333000
• 收稿日期:2022-04-02 修回日期:2022-07-19 发布日期:2023-06-10
• 作者简介:位雅(1997-),女,硕士研究生,主研方向为数据挖掘;张正军,副教授、博士;何凯琳,硕士研究生;唐莉,助教、硕士。
• 基金资助:
国家自然科学基金(61773014)。

### Density Peak Clustering Algorithm Based on Relative Density

WEI Ya1, ZHANG Zhengjun1, HE Kailin1, TANG Li2

1. 1. School of Mathematics and Statistics, Nanjing University of Science and Technology, Nanjing 210094, China;
2. School of Information Engineering, Jingdezhen University, Jingdezhen 333000, Jiangxi, China
• Received:2022-04-02 Revised:2022-07-19 Published:2023-06-10

Abstract: When the density peak clustering algorithm deals with datasets with uneven density，it is easy to divide the low-density clusters into high-density clusters，divide the high-density clusters into multiple sub-clusters，and exists the error propagation occurs in the process of sample point allocation.To solve these problems，a density peak clustering algorithm based on relative density is proposed.The proposed algorithm introduces sample point information in the natural nearest neighborhood，provides a new local density calculation method，and calculates the relative density. After drawing a decision diagram to determine the cluster centers，considering the density difference between clusters，a density factor is proposed to calculate the clustering distance of each cluster，and the remaining sample points are divided according to the clustering distance，then the proposed algorithm clusters datasets with different shapes and densities.Experiments are performed on synthetic and real datasets for comparison with the classical density peak clustering algorithm and three other clustering algorithms.The results show that the proposed algorithm increases the Fowlkes and Mallows Index（FMI），Adjusted Rand Index（ARI），and Normalized Mutual Information（NMI） by an average of approximately 14，26，and 21 percentage points，respectively. At the same time，the proposed algorithm has great advantages in accurately identifying cluster centers and assigning the remaining sample points to datasets with large differences in densities between clusters.