Abstract:
In practice, training data is usually imbalanced, one class is “rare” relative to the other, and misclassification cost of the rare class may be much greater than the cost of the other class. In this situation, accuracy and the misclassification cost should be considered. This paper extends the Support Vector Machine(SVM) learning method, based on the Gauss kernel, by the use of C+( the weight assigned to the rare class), and C- (the weight assigned to the other class)to train more sensitive hyperplane, which is optimized by generic algorithm. Meanwhile, a new sensitive quality measure function is introduced in the optimization process. Experimental results show that the optimized algorithm has competitive performance when dealing with the rare class in the imbalance training data.
Key words:
Support Vector Machine(SVM),
imbalance data,
measure function,
learning parameters optimization
摘要: 在实际应用中的分类数据往往是非平衡数据,少数类别的数据可能有很大的分类代价。分类性能不仅要考虑分类精度,同时要考虑分类代价。该文扩展了支持向量机(SVM)学习方法,对于以高斯核为核函数时的少数类和多数类使用不同的惩罚参数C+, C-以获得高敏感度的超平面,并提出利用遗传算法对SVM的学习参数进行优化调整。给出一种新的评价函数,对分类结果的质量进行评价。实验结果证明,算法对于非平衡数据的分类有较好的效果,对少数类样本预测的准确性较高。
关键词:
支持向量机,
非平衡数据,
评价函数,
学习参数优化
CLC Number:
JIANG Sha; ZHANG Xiao-long. SVM Learning Algorithm Used in Imbalance Data[J]. Computer Engineering, 2008, 34(20): 198-199.
蒋 莎;张晓龙. 一种用于非平衡数据的SVM学习算法[J]. 计算机工程, 2008, 34(20): 198-199.