计算机工程 ›› 2020, Vol. 46 ›› Issue (5): 144-149.doi: 10.19678/j.issn.1000-3428.0054716

• 先进计算与数据处理 • 上一篇    下一篇

一种抗噪的移动时间势能聚类算法

陆慎涛a,b, 葛洪伟a,b   

  1. 江南大学 a. 江苏省模式识别与计算智能工程实验室;b. 物联网工程学院, 江苏 无锡 214122
  • 收稿日期:2019-04-24 修回日期:2019-05-25 发布日期:2019-05-24
  • 作者简介:陆慎涛(1994-),男,硕士研究生,主研方向为人工智能、模式识别;葛洪伟,教授、博士。
  • 基金项目:
    江苏省普通高校研究生科研创新计划项目(KYLX16_0781);江苏省高校优势学科建设工程项目。

An Anti-Noise Travel-Time Potential Energy Clustering Algorithm

LU Shentaoa,b, GE Hongweia,b   

  1. a. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence;b. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Received:2019-04-24 Revised:2019-05-25 Published:2019-05-24

摘要: 移动时间层次聚类是一种势能聚类算法,具有较好的聚类效果,但该算法无法识别数据集中存在的噪声数据点。为此,提出一种抗噪的移动时间势能聚类算法。通过各个数据点的势能值以及数据点之间的相似度找到各个数据点的父节点,计算各数据点到父节点的距离,按照该距离以及数据点的势能得到λ值,并依照λ值大小构造递增曲线,通过递增曲线中的拐点来识别出噪声点,将噪声数据归到新的类簇中,对去除噪声点后的数据集,根据数据点与父节点的距离进行层次聚类来获得聚类结果。实验结果表明,该算法能够识别出数据集中的噪声数据点,从而得到更优的聚类效果。

关键词: 聚类算法, 势能, 移动时间层次聚类, 噪声识别, 数据集

Abstract: Travel-Time based Hierarchical Clustering(TTHC) is a potential energy clustering algorithm,it has a good clustering effect,but the algorithm cannot identify the noisy data points in the dataset.Therefore,this paper proposes an anti-noise travel-time based potential energy clustering algorithm.The parent node of each data point is found through the values of potential energy of each data point and the similarity between data points,and the distance between each data point and the parent node is calculated.Then according to the distance and the values of potential energy of data points, the λ value is obtained.An increasing curve is constructed according to the λ value,and the noise points are identified by finding the inflection points in the increasing curve.The noise data are classified into a new cluster.For the dataset after removing the noise points,the distance between the data point and the parent node is used for hierarchical clustering to obtain the clustering result.Experimental results show that the proposed algorithm can identify the noisy data points in the datasets and thus obtain better clustering effects.

Key words: clustering algorthm, potential energy, Travel-Time based Hierarchical Clustering(TTHC), noise recognition, datasets

中图分类号: