作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (4): 137-148. doi: 10.19678/j.issn.1000-3428.0068956

• 人工智能与模式识别 • 上一篇    下一篇

基于高斯分布的自适应密度峰值聚类算法

李启文*(), 王治和, 杜辉, 鲁德鹏   

  1. 西北师范大学计算机科学与工程学院, 甘肃 兰州 730070
  • 收稿日期:2023-12-05 出版日期:2025-04-15 发布日期:2024-06-03
  • 通讯作者: 李启文
  • 基金资助:
    国家自然科学基金(62372353)

Adaptive Density Peak Clustering Algorithm Based on Gaussian Distribution

LI Qiwen*(), WANG Zhihe, DU Hui, LU Depeng   

  1. School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, Gansu, China
  • Received:2023-12-05 Online:2025-04-15 Published:2024-06-03
  • Contact: LI Qiwen

摘要:

密度峰值聚类(DPC)算法可以发现任意形状的簇, 对噪声具有鲁棒性, 因此被广泛应用于各个领域。但DPC算法需要人工选取聚类中心, 对于密度不均匀型数据集表现较差。为此, 提出一种基于高斯分布的自适应密度峰值聚类算法。首先, 计算局部密度和相对距离的乘积θi, 通过Z-score标准化方法, 将θi映射到符合高斯分布的二维空间中, 利用高斯分布的标准偏差来自适应选取聚类中心, 得到聚类中心集合; 其次, 将其余数据点分配到离其最近的聚类中心所在的簇中, 得到初步划分结果; 最后, 设计缝合因子模型, 计算簇间缝合系数, 当缝合系数大于阈值时合并初步划分结果中最相似簇并更新相似度矩阵, 直至完成合并得到最终结果。在人工数据集和真实数据集上的实验结果表明, 与DBSCAN算法、DPC算法和ICKDC算法对比, 所提算法的聚类准确度更高, 聚类性能更佳。

关键词: 密度峰值聚类算法, 高斯分布, Z-score标准化, 缝合因子, 簇间相似度

Abstract:

The Density Peak Clustering (DPC) algorithm excels in diverse fields, is adept at identifying clusters of any shape, and is noise-resistant. However, the algorithm needs help with manual cluster center selection and underperforms on datasets with uneven densities. This paper introduces a novel Gaussian distribution-based adaptive DPC algorithm to overcome these challenges. This approach involves multiplying the local density by the relative distance θi and mapping this θi into a two-dimensional Gaussian space using Z-score standardization. Uniquely, the algorithm adaptively selects cluster centers based on the standard deviation of the Gaussian distribution and assigns data points to their nearest centers for initial clustering. This paper also introduces a suture factor model to facilitate the merging of similar sub-clusters. When the suture coefficient is greater than the threshold, merge the most similar clusters in the preliminary partition results and update the similarity matrix until the merging process is completed to obtain the final result. The experimental results on artificial and real datasets indicate that compared with DBSCAN algorithm, DPC algorithm, and ICKDC algorithm, the proposed algorithm has higher clustering accuracy and better clustering performance.

Key words: Density Peak Clustering (DPC) algorithm, Gaussian distribution, Z-score standardization, suture factor, inter-cluster similarity