基于均衡有偏支持向量机的软件缺陷预测

doi:10.3969/j.issn.1000-3428.2013.08.018

计算机工程 ›› 2013, Vol. 39 ›› Issue (8): 87-91. doi: 10.3969/j.issn.1000-3428.2013.08.018

基于均衡有偏支持向量机的软件缺陷预测

李倩茹^1,2，姚伟³

(1. 西安通信学院指挥信息系统系，西安 710106；2. 四川大学信息安全研究所，成都 610064； 3. 公安部第三研究所信息网络安全研发中心，上海 201204)

收稿日期:2012-02-02 出版日期:2013-08-15 发布日期:2013-08-13
作者简介:李倩茹(1983－)，女，硕士，主研方向：机器学习，软件测试；姚伟(通讯作者)，助理研究员

Software Defect Prediction Based on Balanced and Biased Support Vector Machine

LI Qian-ru ^1,2, YAO Wei ³

(1. Department of Command Information System, Xi’an Communication Institute, Xi’an 710106, China; 2. Institute of Information Security, Sichuan University, Chengdu 610064, China; 3. Information Network Security Research and Development Center,

Received:2012-02-02 Online:2013-08-15 Published:2013-08-13

摘要/Abstract

摘要： 针对软件缺陷预测中的样本集数量少和分布不对称问题，提出一种基于均衡有偏支持向量机的软件缺陷预测方法。该方法通过标记样本集和未标记样本集进行半监督学习，在少量非对称的标记样本集上，利用有偏支持向量机进行泛化学习。在半监督学习的迭代过程中，采用重采样策略平衡样本集以消除大量不对称的未标记样本集对软件缺陷预测的性能影响。在基准数据集上的实验结果表明，该方法能够有效地对类别不均衡的样本集进行软件缺陷预测。

关键词: 机器学习, 半监督学习, 软件缺陷预测, 有偏支持向量机, 重采样

Abstract: There are two important issues in software defect prediction. It is difficult to collect a large amount of labeled training data to learn a good model. The data set is always imbalanced, since the software system contains much fewer defective modules than non-defective modules. In order to solve out these two problems, this paper proposes a novel semi-supervised learning approach named Balanced and Biased Support Vector Machine(B2SVM). The method exploits the abundant unlabeled samples to improve the prediction accuracy, as well as employs sampling technology to handle the class-imbalance problem during the Biased Support Vector Machine(BSVM) learning process. Experimental results on class-imbalance dataset show that this method can go on software defect prediction for class imbalance sample set.

Key words: machine learning, semi-supervised learning, software defect prediction, Biased Support Vector Machine(BSVM), resampling

中图分类号:

TP311

李倩茹, 姚伟. 基于均衡有偏支持向量机的软件缺陷预测[J]. 计算机工程, 2013, 39(8): 87-91.

LI Qian-Ru, TAO Wei. Software Defect Prediction Based on Balanced and Biased Support Vector Machine[J]. Computer Engineering, 2013, 39(8): 87-91.

http://www.ecice06.com/CN/Y2013/V39/I8/87

参考文献

[1] 王青, 伍书剑, 李明树. 软件缺陷预测技术[J]. 软件学报, 2008, 19(7): 1565-1580.
[2] Basili V R, Briand L C, Melo W L. A Validation of Objec- toriented Design Metrics as Quality Indicators[J]. IEEE Transactions on Software Engineering, 1996, 22(10): 751-761.
[3] Khoshgoftaar T M, Yuan Xiaojing, Allen E B. Balancing Misclassification Rates in Classification-tree Models of Software Quality[J]. Empirical Software Engineering, 2000, 5(4): 313-330.
[4] Khoshgoftaar T M, Allen E B. Neural Networks for Software Quality Prediction[M]//Pedrycz W, Peters J F. Computational Intelligence in Software Engineering. Singapore: World Scientific, 1998: 33-63.
[5] Pérez-Mi?ana E, Gras J. Improving Fault Prediction Using Bayesian Networks for the Development of Embedded Software Applications[J]. Software Testing, Verification & Reliability, 2006, 16(3): 157-174.
[6] Xing Fei, Guo Ping, Lyu M R. A Novel Method for Early Software Quality Prediction Based on Support Vector Machine[C]//Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering. Chicago, USA: IEEE Press, 2005.
[7] Lessmann S, Baesens B, Mues C, et al. Benchmarking Classi- fication Models for Software Defect Prediction: A Proposed Framework and Novel Findings[J]. IEEE Transactions on Software Engineering, 2008, 34(4): 485-496.
[8] Jiang Yuan, Li Ming, Zhou Zhihua. Software Defect Detection with ROCUS[J]. Journal of Computer Science and Technology, 2011, 26(2): 328-342.
[9] Seliya N, Khoshgoftaar T M. Software Quality Estimation with Limited Fault Data: A Semi-supervised Learning Perspective[J]. Software Quality Journal, 2007, 15(3): 327- 344.
[10] Pelayo L, Dick S. Applying Novel Resampling Strategies to Software Defect Prediction[C]//Proceedings of Annual Meeting of the North American Fuzzy Information Processing Society. San Diego, USA: IEEE Press, 2007: 69-72.
[11] Vapnik V N. The Nature of Statistical Learning Theory[M]. New York, USA: Springer-Verlag, 1995.
[12] Chan Chi-Hang. Using Biased Support Vector Machine in Image Retrieval with Self-organizing Map[D]. Hong Kong, China: The Chinese University of Hong Kong, 2004.
[13] Chapman M, Callis P, Jackson W. Metrics Data Pro- gram[EB/OL]. (2004-05-11). http://mdp.ivv.nasa.gov/.
[14] 张瑜, 张德贤. 一种改进的特征权重算法[J]. 计算机工程, 2011, 37(5): 210-212.
编辑陆燕菲

[1]	陈仲磊, 伊鹏, 陈祥, 胡涛. 基于集成学习的系统调用实时异常检测框架[J]. 计算机工程, 2023, 49(6): 162-169,179.
[2]	陈治旭, 靳雁霞, 芦烨, 杨晶, 刘亚变, 史志儒. 基于子图卷积神经网络的多精度服装建模方法[J]. 计算机工程, 2023, 49(4): 174-181.
[3]	刘金硕, 詹岱依, 邓娟, 王丽娜. 基于深度神经网络和联邦学习的网络入侵检测[J]. 计算机工程, 2023, 49(1): 15-21,30.
[4]	葛昕, 邹福泰, 郭万达, 谭越, 李林森. 社交僵尸网络发展综述[J]. 计算机工程, 2022, 48(8): 12-24.
[5]	俞莎莎, 牛保宁. 基于交易不可信度的比特币非法交易检测[J]. 计算机工程, 2022, 48(8): 166-172.
[6]	金海波, 赵欣越. 共形预测框架下的高可靠入侵检测算法[J]. 计算机工程, 2022, 48(7): 130-140.
[7]	佘朝阳, 严馨, 徐广义, 陈玮, 邓忠莹. 融合数据增强与半监督学习的药物不良反应检测[J]. 计算机工程, 2022, 48(6): 314-320.
[8]	钱龙, 赵静, 韩京宇, 毛毅. 基于标签相关性的K近邻多标签学习[J]. 计算机工程, 2022, 48(6): 73-78,88.
[9]	党良慧, 张玉金, 路东生. 基于纹理免疫的JPEG预压缩图像降尺度因子检测[J]. 计算机工程, 2022, 48(5): 272-280.
[10]	李莉, 任振康, 石可欣. 代价敏感的Boosting软件缺陷预测方法[J]. 计算机工程, 2022, 48(3): 175-180.
[11]	刘鹏, 叶润, 闫斌, 谢茜, 刘睿. 一种深度回声状态网络的输入尺度自适应算法[J]. 计算机工程, 2022, 48(2): 92-98,105.
[12]	胡彬, 王晓军, 张雷. 一种半监督对抗鲁棒模型无关元学习方法[J]. 计算机工程, 2022, 48(12): 112-118.
[13]	雷恒林, 古兰拜尔·吐尔洪, 买日旦·吾守尔, 曾琪. 基于Hellinger距离与词向量的终身机器学习主题模型[J]. 计算机工程, 2022, 48(11): 89-95.
[14]	陈良臣, 傅德印. 面向小样本数据的机器学习方法研究综述[J]. 计算机工程, 2022, 48(11): 1-13.
[15]	高伟, 吴顺. 基于多尺度注意力半监督学习的老照片划痕修复[J]. 计算机工程, 2022, 48(10): 245-251,261.

选择文件类型/文献管理软件名称

选择包含的内容

基于均衡有偏支持向量机的软件缺陷预测

Software Defect Prediction Based on Balanced and Biased Support Vector Machine

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于均衡有偏支持向量机的软件缺陷预测

Software Defect Prediction Based on Balanced and Biased Support Vector Machine

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价