Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

An Ensemble Pruning Method for Imbalanced Data Classification

ZHANG Yin-feng, GUO Hua-ping, ZHI Wei-mei, FAN Ming   

  1. (School of Information Engineering, Zhengzhou University, Zhengzhou 450052, China)
  • Received:2013-03-31 Online:2014-06-15 Published:2014-06-13

一种面向不平衡数据分类的组合剪枝方法

张银峰,郭华平,职为梅,范 明   

  1. (郑州大学信息工程学院,郑州 450052)
  • 作者简介:张银峰(1987-),男,硕士研究生,主研方向:数据挖掘;郭华平,博士研究生;职为梅,讲师、博士研究生;范 明,教授、博士生导师。

Abstract: Aiming to solve the problem of the low classification performance on imbalanced data caused by the construction on the balanced data set, this paper proposes a new simple but effective Ensemble Pruning Method Based on Positive Examples(EPPE) to improve the classification performance of ensemble on imbalanced data sets. It establishes classifier database, directly treats positive(minority-class) cases as pruning set, and selects an optimal or sub-optimal classifier based on the index of MBM and pruning set as target classifier to predict classification cases. Experimental results on twelve UCI data sets indicate that EPPE not only significantly improves the recall rate of pruning set on positive(minority-class) cases, but also increases its overall accuracy compared with EasyEnsemble, Bagging and C4.5 algorithm.

Key words: imbalanced data set, ensemble pruning, pruning set, assessment metrics, base classifier

摘要: 传统的数据分类算法多是基于平衡的数据集创建,对不平衡数据分类时性能下降,而实践表明组合选择能有效提高算法在不平衡数据集上的分类性能。为此,从组合选择的角度考虑不平衡类学习问题,提出一种新的组合剪枝方法,用于提升组合分类器在不平衡数据上的分类性能。使用Bagging建立分类器库,直接用正类(少数类)实例作为剪枝集,并通过MBM指标和剪枝集,从分类器库中选择一个最优或次优子组合分类器作为目标分类器,用于预测待分类实例。在12个UCI数据集上的实验结果表明,与EasyEnsemble、Bagging和C4.5算法相比,该方法不但能大幅提升组合分类器在正类上的召回率,而且还能提升总体准确率。

关键词: 不平衡数据集, 组合剪枝, 剪枝集, 评估指标, 基分类器

CLC Number: