作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (4): 61-69. doi: 10.19678/j.issn.1000-3428.0060648

• 人工智能与模式识别 • 上一篇    下一篇

基于加权共享近邻与累加序列的密度峰值算法

王芙银, 张德生, 肖燕婷   

  1. 西安理工大学 理学院, 西安 710054
  • 收稿日期:2021-01-20 修回日期:2021-03-28 发布日期:2021-04-23
  • 作者简介:王芙银(1994—),女,硕士研究生,主研方向为数据挖掘、聚类分析;张德生,教授、博士;肖燕婷,副教授、博士。
  • 基金资助:
    国家自然科学基金青年科学基金项目(11801438)。

Density Peak Algorithm Based on Weighted Shared Nearest Neighbor and Accumulated Sequence

WANG Fuyin, ZHANG Desheng, XIAO Yanting   

  1. School of Sciences, Xi'an University of Technology, Xi'an 710054, China
  • Received:2021-01-20 Revised:2021-03-28 Published:2021-04-23

摘要: 密度峰值聚类(DPC)算法在对密度分布差异较大的数据进行聚类时效果不佳,聚类结果受局部密度及其相对距离影响,且需要手动选取聚类中心,从而降低了算法的准确性与稳定性。为此,提出一种基于加权共享近邻与累加序列的密度峰值算法DPC-WSNN。基于加权共享近邻重新定义局部密度的计算方式,以避免截断距离选取不当对聚类效果的影响,同时有效处理不同类簇数据集分布不均的问题。在原有DPC算法决策值的基础上,生成一组累加序列,将累加序列的均值作为聚类中心和非聚类中心的临界点从而实现聚类中心的自动选取。利用人工合成数据集与UCI上的真实数据集测试与评估DPC-WSNN算法,并将其与FKNN-DPC、DPC、DBSCAN等算法进行比较,结果表明,DPC-WSNN算法具有更好的聚类表现,聚类准确率较高,鲁棒性较强。

关键词: 密度峰值聚类算法, 局部密度, 加权共享近邻, 累加序列, 聚类中心

Abstract: The Density Peak Clustering(DPC) algorithm exhibits poor clustering performance on data with large differences in density distribution.Its clustering results are affected by local density and its relative distance, and the clustering center must be selected manually, which reduces accuracy and stability.Therefore, in this study, we propose a density peak algorithm referred to as DPC-WSNN based on weighted shared nearest neighbor classification and an accumulated sequence.The calculation method of local density is redefined based on a weighted shared nearest neighbor algorithm to avoid the impact of improper selection of truncation distance on clustering performance and effectively address the problem of uneven distributions of different cluster datasets.Based on the decision value of the original DPC algorithm, a group of cumulative sequences are generated, and the mean value of the accumulated sequence is taken as the critical point of cluster and non-cluster centers to automatically select cluster centers.The performance of the proposed DPC-WSNN algorithm was tested and evaluated using synthetic datasets and real datasets of UCI, and compared with that of FKNN-DPC, DPC, DBSCAN, and other algorithms.The results show that the DPC-WSNN algorithm exhibited better clustering performance, high clustering accuracy, and strong robustness.

Key words: Density Peak Clustering(DPC) algorithm, local density, weighted shared nearest neighbor, accumulated sequence, clustering center

中图分类号: