作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于置信度代价敏感的支持向量机不均衡数据学习

赵永彬1,陈硕1,刘明2,曹鹏3   

  1. (1.国网辽宁省电力有限公司信息通信分公司,沈阳 110006;2.中国电力财务有限公司,北京100005; 3.东北大学信息科学与工程学院,沈阳 110819)
  • 收稿日期:2015-04-27 出版日期:2015-10-15 发布日期:2015-10-15
  • 作者简介:赵永彬(1975-),男,高级工程师、硕士,主研方向:人工智能,智能电网;陈硕,工程师、博士;刘明,高级会计师、硕士;曹鹏,讲师、博士。
  • 基金资助:
    国家自然科学基金资助项目(61302012);中央高校基本科研业务费专项基金资助项目(N140403004)。

Imbalanced Data Learning for Support Vector Machine Based on Confidence Cost Sensitivity

ZHAO Yongbin  1,CHEN Shuo  1,LIU Ming  2,CAO Peng  3   

  1. (1.Information and Communication Branch of State Grid Liaoning Electric Power Supply Co., Ltd.,Shenyang 110006,China;2.China Electric Power Finance Co.,Ltd.,Beijing 100005,China;3.College of Information Science and Engineering,Northeastern University,Shenyang 110819,China)
  • Received:2015-04-27 Online:2015-10-15 Published:2015-10-15

摘要: 现实世界中广泛存在着很多不均衡的数据,其分类问题是机器学习领域的研究热点。为了提高不均衡数据的分类性能,提出一种基于核空间置信度的代价敏感支持向量机分类算法。通过注入类别错分代价机制,以不均衡数据评价指标作为目标函数,优化错分代价因子,提升少数类样本的识别率。计算类中所有样本在核空间下的类别置信度,从而确定样本对决策分类贡献的重要程度,降低噪音或孤立点对支持向量机的影响。通过大量UCI数据集的实验结果表明,与其他同类算法相比,该算法能更好地提高不均衡数据的分类性能。

关键词: 机器学习, 分类, 不均衡数据学习, 支持向量机, 代价敏感学习

Abstract: Imbalanced data classification problem is one of the main research field of machine learning in the real world.In order to improve the classification performance of Support Vector Machine(SVM),a kernel space confidence based cost SVM is proposed.It can improve the accuracy of minority class by injecting the strategy of misclassification cost into training.Using the imbalanced data evaluation metric as the objective function,the method optimizes the misclassification cost parameter,so as to improve the accuracy of minority class.Moreover,the weight of each instance for decision classification contribution can be obtained by calculating the class confidence on the kernel space,so as to decrease the effect of noisy and outlier instances for SVM.Experimental results show that the proposed algorithm provides a very competitive solution to other existing methods for combating imbalanced classification problems.

Key words: machine learning, classification, imbalanced data learning, Support Vector Machine(SVM), cost sensitive learning

中图分类号: