Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (3): 175-180. doi: 10.19678/j.issn.1000-3428.0061316

• Computer Architecture and Software Technology • Previous Articles     Next Articles

Cost Sensitive Boosting Software Defect Prediction Method

LI Li, REN Zhenkang, SHI Kexin   

  1. College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
  • Received:2021-03-30 Revised:2021-06-12 Published:2021-07-09

代价敏感的Boosting软件缺陷预测方法

李莉, 任振康, 石可欣   

  1. 东北林业大学 信息与计算机工程学院, 哈尔滨 150040
  • 作者简介:李莉(1977-),女,副教授、博士,主研方向为先进软件工程技术、群智能优化、大型分布式计算;任振康、石可欣,硕士研究生。
  • 基金资助:
    黑龙江省教育科学规划重点课题(GJB1421251)。

Abstract: Software defect prediction can effectively improve the reliability of software and remedy the loopholes in a system.Boosting resampling is a common method for solving the problem of insufficient software defect prediction samples.However, the conventional Boosting method is ineffective in solving the problem of domain class imbalance. Therefore, a cost sensitive Boosting software defect prediction method named CSBst is proposed in this study. Considering the different costs of missing data and false positives in the defect module, the cost sensitive Boosting method is used to update and increase the sample weight of the first error type.This ensures that the updated weight is greater than the weight of the flawless sample and the second error type sample, which improves the prediction rate of the module.The threshold moving method is used to integrate the classification results of multiple decision tree-based classifiers to solve the over fitting problem.Subsequently, the optimal weight and threshold values in the model construction process are determined analytically.Experiments on NASA software defect prediction dataset demonstrate that with small samples, compared to CSBKNN and CSCE methods, the BAL prediction index of CSBst method is 7% and 3% higher, respectively.Moreover, the time complexity is reduced by one order of magnitude.

Key words: software defect prediction, decision tree, machine learning, threshold moving method, Boosting method

摘要: 软件缺陷预测可以有效提高软件的可靠性,修复系统存在的漏洞。Boosting重抽样是解决软件缺陷预测样本数量不足问题的常用方法,但常规Boosting方法在处理领域类不平衡问题时效果不佳。为此,提出一种代价敏感的Boosting软件缺陷预测方法CSBst。针对缺陷模块漏报和误报代价不同的问题,利用代价敏感的Boosting方法更新样本权重,增大产生第一类错误的样本权重,使之大于无缺陷类样本权重与第二类错误样本的权重,从而提高模块的预测率。采用阈值移动方法对多个决策树基分类器的分类结果进行集成,以解决过拟合问题。在此基础上,通过分析给出模型构建过程中权重和阈值的最优化设置。在NASA软件缺陷预测数据集上进行实验,结果表明,在小样本的情况下,与CSBKNN、CSCE方法相比,CSBst方法的BAL预测指标分别提升7%和3%,且时间复杂度降低一个数量级。

关键词: 软件缺陷预测, 决策树, 机器学习, 阈值移动方法, Boosting方法

CLC Number: