Abstract:
For problem of imbalanced data learning, a gradually learning classification algorithm is proposed. This classification algorithm gradually adds the synthetic minority class examples according to attribute value-range distribution, and removes timely the synthetic examples which the stage classifier misclassifies. As the data achieves the desired degree of balance, the method uses raw data and synthetic data training learning algorithm, and gets the final classifier. Experimental results show that the gradually learning algorithm is better than C4.5, and better than SMOTEBoost and DataBoost-IM on most data sets.
Key words:
classification,
imbalanced data,
gradually learning
摘要:
针对不平衡数据学习问题,提出一种采用渐进学习方式的分类算法。根据属性值域分布,逐步添加合成少数类样例,并在阶段分类器出现误分时,及时删除被误分的合成样例。当数据达到预期的平衡程度时,用原始数据和合成数据训练学习算法,得到最终分类器。实验结果表明,该算法优于C4.5算法,并在多数数据集上优于SMOTEBoost和DataBoost-IM。
关键词:
分类,
不平衡数据,
渐进学习
CLC Number:
DONG Yuan-Fang, LI Xiong-Fei, LI Jun. Gradually Learning Algorithm for Imbalanced Data[J]. Computer Engineering, 2010, 36(24): 161-163.
董元方, 李雄飞, 李军. 一种不平衡数据渐进学习算法[J]. 计算机工程, 2010, 36(24): 161-163.