作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (7): 122-129. doi: 10.19678/j.issn.1000-3428.0062092

• 人工智能与模式识别 • 上一篇    下一篇

基于隶属度的模糊加权k近质心近邻算法

刘利, 张德生, 肖燕婷   

  1. 西安理工大学 理学院, 西安 710054
  • 收稿日期:2021-07-14 修回日期:2021-08-27 出版日期:2022-07-15 发布日期:2021-09-06
  • 作者简介:刘利(1997—),女,硕士研究生,主研方向为数据挖掘、分类分析;张德生,教授、博士;肖燕婷,副教授、博士。
  • 基金资助:
    国家自然科学基金青年科学基金项目(11801438)。

Fuzzy Weighted k-Nearest Centroid Neighbor Algorithm Based on Membership

LIU Li, ZHANG Desheng, XIAO Yanting   

  1. School of Sciences, Xi'an University of Technology, Xi'an 710054, China
  • Received:2021-07-14 Revised:2021-08-27 Online:2022-07-15 Published:2021-09-06

摘要: 模糊k近质心近邻算法(FKNCN)的分类结果易受噪声点和离群点影响,并且算法对所有样本特征同等对待,不能体现样本特征的差异性。针对这两个问题,提出基于隶属度的模糊加权k近质心近邻算法MRFKNCN。利用密度聚类思想构造新的隶属度函数计算训练样本的隶属度,以减小噪声或离群样本对分类结果的影响。在此基础上,设计基于冗余分析的Relief-F算法计算每个特征的权重,删去较小权重所对应的特征和冗余特征,并通过加权欧氏距离选取有代表性的k个近质心近邻,提高分类性能。最终,根据最大隶属度原则确定待分类样本的类别。利用UCI和KEEL中的多个数据集对MRFKNCN算法进行测试,并与KNN、KNCN、LMKNCN、FKNN、FKNCN2和BMFKNCN算法进行比较。实验结果表明,MRFKNCN算法的分类性能明显优于其他6个对比算法,平均准确率最高可提升4.68个百分点。

关键词: k近质心近邻算法, 隶属度, 冗余分析, 特征选择, 数据分类

Abstract: The classification results of Fuzzy K-Nearest Centroid Neighbor(FKNCN) algorithm is susceptible to noise points, outliers, at the same time, the algorithm treats all sample features equally and cannot reflect the difference of sample features.To solve these two problems, fuzzy weighted k-nearest centroid neighbor algorithm (MRFKNCN) based on membership was proposed.Firstly, a new membership function is constructed by the idea of density clustering and the membership degree of training samples is calculated, which can avoid the influence of noise or outlier samples on the classification results.Then, the weight of each feature was calculated by the Relief-F algorithm of redundancy analysis, the features and redundant features corresponding to smaller weights were deleted, and k representative nearest centroid neighbors were selected by weighted Euclidean distance to improve the performance of classification.Finally,the classification of samples to be classified is determined by the maximum membership principle.The MRFKNCN algorithm is tested using multiple datasets in UCI and KEEL,and compared with KNN,KNCN,LMKNCN,FKNN,FKNCN2 and BMFKNCN.The experimental results show that the classification performance of MRFKNCN algorithm is significantly better than the other six comparison algorithms,the average accuracy can be improved by up to 4.68 percentage points.

Key words: Fuzzy k-Nearest Centroid Neighbor(FKNCN) algorithm, membership, redundancy analysis, feature selection, data classification

中图分类号: