作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (20): 198-199. doi: 10.3969/j.issn.1000-3428.2008.20.072

• 人工智能及识别技术 • 上一篇    下一篇

一种用于非平衡数据的SVM学习算法

蒋 莎,张晓龙   

  1. (武汉科技大学计算机学院,武汉 430081)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-10-20 发布日期:2008-10-20

SVM Learning Algorithm Used in Imbalance Data

JIANG Sha, ZHANG Xiao-long   

  1. (School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-10-20 Published:2008-10-20

摘要: 在实际应用中的分类数据往往是非平衡数据,少数类别的数据可能有很大的分类代价。分类性能不仅要考虑分类精度,同时要考虑分类代价。该文扩展了支持向量机(SVM)学习方法,对于以高斯核为核函数时的少数类和多数类使用不同的惩罚参数C+, C-以获得高敏感度的超平面,并提出利用遗传算法对SVM的学习参数进行优化调整。给出一种新的评价函数,对分类结果的质量进行评价。实验结果证明,算法对于非平衡数据的分类有较好的效果,对少数类样本预测的准确性较高。

关键词: 支持向量机, 非平衡数据, 评价函数, 学习参数优化

Abstract: In practice, training data is usually imbalanced, one class is “rare” relative to the other, and misclassification cost of the rare class may be much greater than the cost of the other class. In this situation, accuracy and the misclassification cost should be considered. This paper extends the Support Vector Machine(SVM) learning method, based on the Gauss kernel, by the use of C+( the weight assigned to the rare class), and C- (the weight assigned to the other class)to train more sensitive hyperplane, which is optimized by generic algorithm. Meanwhile, a new sensitive quality measure function is introduced in the optimization process. Experimental results show that the optimized algorithm has competitive performance when dealing with the rare class in the imbalance training data.

Key words: Support Vector Machine(SVM), imbalance data, measure function, learning parameters optimization

中图分类号: