作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (11): 100-107. doi: 10.19678/j.issn.1000-3428.0059373

• 人工智能与模式识别 • 上一篇    下一篇

基于不平衡数据的特征选择算法研究

王俊红1,2, 赵彬佳1   

  1. 1. 山西大学 计算机与信息技术学院, 太原 030006;
    2. 山西大学 计算智能与中文信息处理教育部重点实验室, 太原 030006
  • 收稿日期:2020-08-27 修回日期:2020-10-21 发布日期:2020-10-27
  • 作者简介:王俊红(1979-),女,副教授、博士,主研方向为数据挖掘、机器学习;赵彬佳,硕士研究生。
  • 基金资助:
    国家自然科学基金(61976128);山西省自然科学基金(201701D121051)。

Research on Feature Selection Algorithms Based on Unbalanced Data

WANG Junhong1,2, ZHAO Binjia1   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
  • Received:2020-08-27 Revised:2020-10-21 Published:2020-10-27

摘要: 不平衡分类问题广泛存在于医疗、经济等领域,对于不平衡数据集分类,特别是高维数据分类时,有效的特征选择算法至关重要。然而多数特征选择算法未考虑特征协同的影响,导致分类性能下降。对FAST特征选择算法进行改进,并考虑特征的协同作用,提出一种新的特征选择算法FSBS。运用AUC对特征进行评估,以相互增益衡量协同作用大小,选出有效特征,进而对不平衡数据进行分类。实验结果表明,该算法能有效地选择特征,尤其在特征数量较少的情况下可保持较高的分类准确率。

关键词: 特征选择, 不平衡数据, FSBS算法, 特征协同, 分类准确率

Abstract: The problem of unbalanced classification widely exists in medical,economic and other fields.Research shows that for the classification of unbalanced data sets,especially when the data is high-dimensional,an effective feature selection algorithm is crucial.However,most feature selection algorithms do not consider the impact of feature synergy,resulting in a decrease in classification performance.Considering the synergy of features,this paper proposes a new feature selection algorithm,FSBS,on the basis of the improved FAST feature selection algorithm.This algorithm employs AUC to evaluate the features,and the mutual gain is used to measure the magnitude of synergy.Then the effective features are selected and the unbalanced data are classified.Experimental results show that the proposed algorithm can effectively select features and improve the classification performance,especially when the number of features is small.

Key words: feature selection, unbalanced data, FSBS algorithm, feature synergy, classification accuracy

中图分类号: