作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2013, Vol. 39 ›› Issue (8): 87-91. doi: 10.3969/j.issn.1000-3428.2013.08.018

• 体系结构与软件技术 • 上一篇    下一篇

基于均衡有偏支持向量机的软件缺陷预测

李倩茹1,2,姚 伟3   

  1. (1. 西安通信学院指挥信息系统系,西安 710106;2. 四川大学信息安全研究所,成都 610064; 3. 公安部第三研究所信息网络安全研发中心,上海 201204)
  • 收稿日期:2012-02-02 出版日期:2013-08-15 发布日期:2013-08-13
  • 作者简介:李倩茹(1983-),女,硕士,主研方向:机器学习,软件测试;姚 伟(通讯作者),助理研究员

Software Defect Prediction Based on Balanced and Biased Support Vector Machine

LI Qian-ru 1,2, YAO Wei 3   

  1. (1. Department of Command Information System, Xi’an Communication Institute, Xi’an 710106, China; 2. Institute of Information Security, Sichuan University, Chengdu 610064, China; 3. Information Network Security Research and Development Center,
  • Received:2012-02-02 Online:2013-08-15 Published:2013-08-13

摘要: 针对软件缺陷预测中的样本集数量少和分布不对称问题,提出一种基于均衡有偏支持向量机的软件缺陷预测方法。该方法通过标记样本集和未标记样本集进行半监督学习,在少量非对称的标记样本集上,利用有偏支持向量机进行泛化学习。在半监督学习的迭代过程中,采用重采样策略平衡样本集以消除大量不对称的未标记样本集对软件缺陷预测的性能影响。在基准数据集上的实验结果表明,该方法能够有效地对类别不均衡的样本集进行软件缺陷预测。

关键词: 机器学习, 半监督学习, 软件缺陷预测, 有偏支持向量机, 重采样

Abstract: There are two important issues in software defect prediction. It is difficult to collect a large amount of labeled training data to learn a good model. The data set is always imbalanced, since the software system contains much fewer defective modules than non-defective modules. In order to solve out these two problems, this paper proposes a novel semi-supervised learning approach named Balanced and Biased Support Vector Machine(B2SVM). The method exploits the abundant unlabeled samples to improve the prediction accuracy, as well as employs sampling technology to handle the class-imbalance problem during the Biased Support Vector Machine(BSVM) learning process. Experimental results on class-imbalance dataset show that this method can go on software defect prediction for class imbalance sample set.

Key words: machine learning, semi-supervised learning, software defect prediction, Biased Support Vector Machine(BSVM), resampling

中图分类号: