Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (3): 109-116. doi: 10.19678/j.issn.1000-3428.0056670

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Rough K-Means Algorithm Based on Mixed Measure of Neighborhood Partition Information

SUN Jingyong, MA Fumin   

  1. College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China
  • Received:2019-11-21 Revised:2020-01-21 Published:2020-03-04

基于邻域归属信息混合度量的粗糙K-Means算法

孙静勇, 马福民   

  1. 南京财经大学 信息工程学院, 南京 210023
  • 作者简介:孙静勇(1996-),男,硕士研究生,主研方向为智能信息处理;马福民(通信作者),教授、博士。
  • 基金资助:
    国家自然科学基金(61973151);江苏省自然科学基金(BK20191376);江苏省高校自然科学研究重大项目(17KJA120001)。

Abstract: For Rough K-Means(RKM) and its derivative clustering algorithms,the distances between the data object in the boundary area and the cluster centers vary slightly and it is difficult to cluster the data by the distance or density. This paper proposes a new rough K-Means algorithm,which integrates the local density of data objects with their neighborhood information to measure the similarity between the data points and the clusters.The relationship between boundary data and clusters is determined by their local spatial distribution,which makes the difference between fuzzy uncertain information more obvious.Experimental results on the artificial dataset and the UCI standard datasets show that the presented algorithm has a higher accuracy for the clustering of boundary data.

Key words: rough set, K-Means algorithm, local density, neighborhood information, intra-cluster similarity

摘要: 粗糙K-Means及其衍生算法在处理边界区域不确定信息时,其边界区域中的数据对象因与各类簇中心点的距离相差较小,导致难以依据距离、密度对数据点进行区分判断。提出一种新的粗糙K-Means算法,在对数据进行划分时,综合数据对象的局部密度与邻域归属信息来衡量数据点与类簇的相似性,边界数据与类簇之间的关系由其局部的空间分布所决定,使得模糊不确定信息之间的差异更明显。在人工数据集和UCI标准数据集上的实验结果表明,该算法对边界区域数据的划分具有更高的准确率。

关键词: 粗糙集, K-Means算法, 局部密度, 邻域信息, 簇内相似

CLC Number: