A Density Clustering Algorithm with Optimized Initial Points and Adaptive Radius

doi:10.19678/j.issn.1000-3428.0059770

Abstract

Abstract: The DBSCAN algorithm cannot accurately cluster the datasets with uneven densities, and the clustering results are greatly affected by the parameters of the neighborhood threshold and density threshold.This paper proposes a new density clustering algorithm for optimizing initial points and adaptive radius.The algorithm uses the Reverse Nearest Neighbor(RNN) and similarity matrix to find the sample point with the largest global density.Through the analysis of the density distribution around the sample, the neighborhood threshold of the current cluster is calculated using an adaptive method and then clustered using the DBSCAN algorithm.The experimental results on artificial datasets and UCI datasets show that compared with the DBSCAN, OPTICS and RNN-DBSCAN algorithms, the proposed algorithm displays the highest score in all five evaluation indexes, including ARI, NMI, Homogeneity, Completeness and V-measure, reaching 1.0 on the Compound dataset and Jain dataset.It can provide high efficiency and accuracy in clustering.

Key words: density clustering, initial point optimization, Reverse Nearest Neighbor(RNN), adaptive radius, similarity matrix

摘要： 传统DBSCAN算法不能正确聚类密度不均匀的数据集，聚类结果受邻域阈值和密度阈值参数的影响较大。提出一种新的优化初始点和自适应半径的密度聚类算法。利用反向最近邻和相似度矩阵发现当前全局密度最大的数据样本，分析该样本周围密度的分布情况，采用自适应的方法计算当前簇的邻域阈值，并利用DBSCAN算法进行聚类。在人工数据集和UCI数据集上进行测试的结果表明，与经典的DBSCAN、OPTICS、RNN-DBSCAN算法相比，优化初始点和自适应半径的密度聚类算法在ARI、NMI、Homogeneity、Completeness和V-measure 5个评价指标上整体取得最优值，其中在Compound、Jain等数据集上达到1.0，具有较高的聚类效率和准确度。

关键词: 密度聚类, 初始点优化, 反向最近邻, 自适应半径, 相似度矩阵

CLC Number:

TP18

WANG Zhihe, CAO Xuyan, DU Hui. A Density Clustering Algorithm with Optimized Initial Points and Adaptive Radius[J]. Computer Engineering, 2022, 48(1): 51-59.

王治和, 曹旭琰, 杜辉. 一种优化初始点与自适应半径的密度聚类算法[J]. 计算机工程, 2022, 48(1): 51-59.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0059770

http://www.ecice06.com/EN/Y2022/V48/I1/51

Figures/Tables 17

References

[1] 王燕军.KNNSCAN聚类算法研究[D].兰州:兰州大学, 2017. WANG Y J.Research on the KNNSCAN clustering algorithm[D].Lanzhou:Lanzhou University, 2017.(in Chinese)
[2] 李志玲.基于数据挖掘的客户关系管理研究[D].保定:河北大学, 2010. LI Z L.Research on customer relationship management based on data mining[D].Baoding:Hebei University, 2010.(in Chinese)
[3] 韩家炜, KAMBER M, 裴健, 等.数据挖掘:概念与技术[M].范明, 孟小峰, 译.3版.北京:机械工业出版社, 2012:186-188. HAN J W, KAMBER M, PEI J, et al.Data mining:concepts and techniques[M].FAN M, MENG X F, Translation.The 3rd Eed.Beijing:China Machine Press, 2012:186-188.(in Chinese)
[4] CHEN Y W, TANG S Y, BOUGUILA N, et al.A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data[J].Pattern Recognition, 2018, 83:375-387.
[5] RAFAEL C G, RICHARD E W.Digital image proc-essing[M].Upper Saddle River, USA:Prentice Hall, 2008.
[6] SERGIOS T, KONSTANTINOS K.Pattern recognition[M].2nd ed.New York, USA:Academic Press, 2003.
[7] MADEIRA S C, OLIVEIRA A L.Bi-clustering algorithms for biological data analysis:a survey[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1(1):24-45.
[8] FAYYAD U M, SHAPIRO G P, SMYTH P, et al.Advances in knowledge discovery and data mining[M].Boston, USA:MIT Press, 1996.
[9] HUANG C W, LIN K P, WU M C, et al.Intuitionistic fuzzy c-means clustering algorithm with neighborhood attraction in segmenting medical image[J].Soft Computing, 2015, 19(2):459-470.
[10] ANANTHI V P, BALASUBRAMANIAM P, KALAISELVI T.A new fuzzy clustering algorithm for the segmentation of brain tumor[J].Soft Computing, 2015, 20(1/2):1-21.
[11] CHINCHULUUN R, LEE W S, BHORANIA J, et al.Clustering and classification algorithms in food and agricultural applications:a survey[J].Modeling Agricultural Systems, 2009, 25:443-454.
[12] ESTER M, KRIEGEL H P, SANDER J, et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland, Oregon:AAAI Press, 1996:226-231.
[13] ANKERST M, BREUNIG M M, KRIEGEL H P, et al.OPTICS:ordering points to identify the clustering structure[C]//Proceedings of ACM SIGMOD' 99.Philadelphia, USA:ACM Press, 1999:49-60.
[14] HINNEBURG A, KEIM D A.An efficient approach to clustering in large multimedia databases with noise[J].Proceeding of IEEE KDDʼ98.Washington D.C., USA:IEEE Press, 1998:58-65.
[15] 曾依灵, 许洪波, 白硕.改进的OPTICS算法及其在文本聚类中的应用[J].中文信息学报, 2008, 22(1):51-55. ZENG Y L, XU H B, BAI S.An improved OPTICS algorithm and its application in text clustering[J].Journal of Chinese Information Processing, 2008, 22(1):51-55.(in Chinese)
[16] 戴阳阳, 李朝锋, 徐华.初始点优化与参数自适应的密度聚类算法[J].计算机工程, 2016, 42(1):203-209. DAI Y Y, LI C F, XU H, et al.Density spatial clustering algorithm with initial point optimization and parameter self-adaption[J].Computer Engineering, 2016, 42(1):203-209.(in Chinese)
[17] 周董, 刘鹏.VDBSCAN:变密度聚类算法[J].计算机工程与应用, 2009, 45(11):137-141. ZHOU D, LIU P.VDBSCAN:variable density clustering algorithm[J].Computer Engineering and Applications, 2009, 45(11):137-141.(in Chinese)
[18] BRYANT A C, CIOS K J.RNN-DBSCAN:a density-based clustering algorithm using reverse nearest neighbor density estimates[J].IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6):1109-1121.
[19] LI H, LIU X J, LI T, et al.A novel density-based clustering algorithm using nearest neighbor graph[J].Pattern Recognition, 2020, 102:107-116.
[20] 刘淑芬, 孟冬雪, 王晓燕.基于网格单元的DBSCAN算法[J].吉林大学学报(工学版), 2014, 44(4):1135-1139. LIU S F, MENG D X, WANG X Y.DBSCAN algorithm based on grid cell[J].Journal of Jilin University(Engineering and Technology Edition), 2014, 44(4):1135-1139.(in Chinese)
[21] VINH N X, EPPS J, BAILEY J.Information theoretic measures for clusterings comparison[C]//Proceedings of International Conference on Machine Learning.New York, USA:ACM Press, 2010:2837-2854.
[22] NGUYEN T P Q, KUO R J.Partition-and-merge based fuzzy genetic clustering algorithm for categorical data[J].Applied Soft Computing, 2019, 75:254-264.
[23] ZHANG H J, GUO H, WANG X H, et al.Clothescounter:a framework for star-oriented clothes mining from videos[J].Neurocomputing, 2020, 377(15):38-48.

Please choose a citation manager

Content to export