计算机工程 ›› 2019, Vol. 45 ›› Issue (8): 80-85.doi: 10.19678/j.issn.1000-3428.0053297

• 体系结构与软件技术 • 上一篇    下一篇

基于特征选择的软件缺陷报告严重性评估

刘文杰, 江贺   

  1. 大连理工大学 软件学院, 辽宁 大连 116621
  • 收稿日期:2018-12-03 修回日期:2019-01-11 出版日期:2019-08-15 发布日期:2019-08-08
  • 作者简介:刘文杰(1979-),男,工程师、硕士,主研方向为软件测试、网络工程;江贺,教授、博士、博士生导师。
  • 基金项目:
    国家自然科学基金"超启发式算法的多视角分析及应用研究"(61175062)。

Severity Assessment of Software Defect Reports Based on Feature Selection

LIU Wenjie, JIANG He   

  1. School of Software Technology, Dalian University of Technology, Dalian, Liaoning 116621, China
  • Received:2018-12-03 Revised:2019-01-11 Online:2019-08-15 Published:2019-08-08

摘要: 针对Bugzilla缺陷跟踪系统的Eclipse项目软件缺陷报告数据集,使用特征选择和机器学习算法对向量化的原始数据进行特征降维、权重优化等处理,得到数据维度较低的优化数据集,并采用分类算法评估软件缺陷报告严重程度。通过对4种特征选择算法及4种机器学习算法处理结果的交叉对比表明,使用信息增益特征选择算法对原始数据集进行特征优化,并结合多项式贝叶斯算法对优化数据集进行训练与测试,可使软件缺陷报告严重性预测的AUROC值提高至0.767。

关键词: 开源软件, 软件缺陷报告, 特征选择, 机器学习, 严重性评估, 修复率

Abstract: To address the datasets of Eclipse project software defect reports of Bugzilla defect tracking system,feature selection and machine learning algorithms are used to perform feature dimension reduction and weight optimization on vectorized original data to obtain optimized dataset with lower data dimension,and classification algorithms are used to evaluate the severity of software defect reports.The cross-comparison results of the four feature selection algorithms and the four machine learning algorithm results show that the Information Gain(IG) feature selection algorithm is used to perform feature optimization on the original dataset,and the optimized dataset is trained and tested by using the Multinamial Naive Bayes(MNB) algorithm.The AUROC value of the severity prediction of software defect reports can be increased to 0.767.

Key words: opensource software, software defect report, feature selection, machine learning, severity assessment, repair rate

中图分类号: