计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于Hubness与类加权的k最近邻分类算法

李金孟,林亚平,祝团飞   

  1. (湖南大学 信息科学与工程学院,长沙 410082)
  • 收稿日期:2017-03-15 出版日期:2018-04-15 发布日期:2018-04-15
  • 作者简介:李金孟(1992—),男,硕士研究生,主研方向为机器学习;林亚平,教授;祝团飞,博士研究生。

k-Nearest Neighbor Classification Algorithm Based on Hubness and Class Weighting

LI Jinmeng,LIN Yaping,ZHU Tuanfei   

  1. (College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China)
  • Received:2017-03-15 Online:2018-04-15 Published:2018-04-15

摘要: 针对高维不平衡数据中维数灾难和类不平衡分布问题,提出一种改进k最近邻(kNN)分类算法HWNN。将样本的k发生分布作为其在预测时对各个类的支持度,以此减少高维数据中hubs对kNN分类带来的潜在负面影响。通过类加权的方式增加少数类在所有样本k发生中的分布比例,以提升对少数类样本的预测精度。在16个不平衡UCI数据集上的实验结果表明,该算法在高维不平衡数据中的分类结果优于典型kNN方法,且在普通维度的不平衡数据中优势同样明显。

关键词: Hubness现象, 高维不平衡数据, 维数灾难, 数据分类, k发生, k最近邻分类

Abstract: Aiming at the problem of dimensionality curse and unbalance distribution in high dimensional unbalance data,an improved k-Nearest Neighbor(kNN) classification algorithm named HWNN is proposed.The k-occur of samples are regarded as the support for each class at the time of prediction,on this way,the potential negative impact of hubs on kNN classification in high dimensional data is reduced.In order to improve the prediction accuracy for a small number of samples,the distribution ratio of minority groups in the occurrence of all samples k-occur is increased by class weighting.Experimental results on 16 imbalanced UCI datasets show that,the classification results of the proposed algorithm in high-dimensional imbalanced data are better than those of several typical kNN methods,the advantage is also obvious in the unbalanced data of the ordinary dimension.

Key words: Hubness phenomenon, high-dimensional and unbalanced data, dimensionality curse, data classification, k-occur, k-Nearest Neighbor(kNN) classification

中图分类号: