作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (10): 207-209. doi: 10.3969/j.issn.1000-3428.2007.10.074

• 人工智能及识别技术 • 上一篇    下一篇

不均衡数据集中基于Adaboost的过抽样算法

韩 慧1,王文渊1,毛炳寰2   

  1. (1. 清华大学自动化系,北京 100084;2. 中央财经大学统计系,北京 100081)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-05-20 发布日期:2007-05-20

Over-sampling Algorithm Based on Adaboost in Unbalanced Data Set

HAN Hui1, WANG Wenyuan1, MAO Binghuan2   

  1. 2. Department of Statistics, Central University of Finance and Economics, Beijing 100081)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-05-20 Published:2007-05-20

摘要: 为了提高不均衡数据集中少数类的分类性能,该文融合了提升和过抽样的优点,提出了基于提升算法Adaboost的过抽样算法MCMO-Boost,并且将其与决策树算法C4.5、提升算法Adaboost和过抽样算法SMOTE进行了实验比较与分析。结果表明,MCMO-Boost算法在少数类和数据集的总体分类性能方面都优于其它算法。

关键词: 不均衡数据集, 过抽样, 提升算法

Abstract: To improve the classification performance of minority class, this paper combines the advantages of boosting and over-sampling, and presents an over-sampling algorithm based on MCMO-Boost of Adaboost. MCMO-Boost is compared with C4.5, Adaboost and SMOTE, and the results show that MCMO-Boost performs better than others for the classification performance of minority class and the whole data set.

Key words: Unbalanced data set, Over-sampling, Boosting algorithm

中图分类号: