作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (24): 161-163. doi: 10.3969/j.issn.1000-3428.2010.24.058

• 人工智能及识别技术 • 上一篇    下一篇

一种不平衡数据渐进学习算法

董元方1,2a,李雄飞1,李 军1,2b   

  1. (1. 吉林大学符号计算与知识工程教育部重点实验室,长春 130012;2. 长春理工大学 a. 经济管理学院;b. 数学系,长春 130022)
  • 出版日期:2010-12-20 发布日期:2010-12-14
  • 作者简介:董元方(1975-),女,讲师、博士研究生,主研方向:粗糙集理论,数据挖掘;李雄飞,教授、博士生导师;李 军,副教授、博士研究生
  • 基金资助:

    国家科技支撑计划基金资助项目(2006BAK01A33);吉林省科技发展计划基金资助项目(20070321, 20090704)

Gradually Learning Algorithm for Imbalanced Data

DONG Yuan-fang 1,2a, LI Xiong-fei 1, LI Jun 1,2b   

  1. (1. Key Laboratory of Symbolic Computation and Knowledge Engineering for Ministry of Education, Jilin University, Changchun 130012, China; 2a. School of Economics and Management; 2b. Dept. of Mathematics, Changchun University of Science and Technology,Changchun 130022, China)
  • Online:2010-12-20 Published:2010-12-14

摘要:

针对不平衡数据学习问题,提出一种采用渐进学习方式的分类算法。根据属性值域分布,逐步添加合成少数类样例,并在阶段分类器出现误分时,及时删除被误分的合成样例。当数据达到预期的平衡程度时,用原始数据和合成数据训练学习算法,得到最终分类器。实验结果表明,该算法优于C4.5算法,并在多数数据集上优于SMOTEBoost和DataBoost-IM。

关键词: 分类, 不平衡数据, 渐进学习

Abstract:

For problem of imbalanced data learning, a gradually learning classification algorithm is proposed. This classification algorithm gradually adds the synthetic minority class examples according to attribute value-range distribution, and removes timely the synthetic examples which the stage classifier misclassifies. As the data achieves the desired degree of balance, the method uses raw data and synthetic data training learning algorithm, and gets the final classifier. Experimental results show that the gradually learning algorithm is better than C4.5, and better than SMOTEBoost and DataBoost-IM on most data sets.

Key words: classification, imbalanced data, gradually learning

中图分类号: