作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (6): 31-33.

• 博士论文 • 上一篇    下一篇

基于间隔区域样本数量的加权支持向量机

王 晔,黄上腾   

  1. 上海交通大学计算机系,上海 200030
  • 出版日期:2006-03-20 发布日期:2006-03-20

Weighted SVM Based on the Number of Marginal Samples

WANG Ye, HUANG Shangteng   

  1. Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030
  • Online:2006-03-20 Published:2006-03-20

摘要: 分析了数量不对称的样本在允许训练误差的支持向量机训练时产生的最优分界面偏移的问题,认为支持向量机的最优分界面位置取决于间隔(margin)区域中正反例样本数量的比例,而不是传统加权支持向量机所采用的全部正反例样本的数量比。对间隔区域中正反例数量不对称的两类样本采用同样的折衷因子将导致最优分界面向间隔区域中样本较少的类别方向偏移。提出了将折中因子与间隔区域中样本的数量比例联系起来的加权支持向量机,并提出了一种在核函数特征空间估计间隔区域样本数量的方法。试验证明该方法可以提高加权支持向量机的分类性能。

关键词: 支持向量机;不对称样本;间隔区域;最优分界面偏移;核函数特征空间

Abstract: This paper analyzes the bias of the optimal hyperplane of SVMs when samples are imbalanced, and find that this bias determined by the proportion of samples in the margin, instead of the proportion of all samples, as traditional weighted SVMs adopted. When marginal samples are imbalanced, the using of same tradeoff factors cause a bias of the optimal hyperplane towards the class with lesser marginal samples. This paper proposes a new weighted SVM, whose tradeoff factors are related to the number of marginal samples. To calculate the number of marginal samples in the feature space of the kernel function, a method for estimation is also proposed. Experimental results show that the new method improves the classification performance.

Key words: SVM; Imbalanced samples; Margin; Bias of the optimal hyperplane; Feature space of the kernel function