基于特征选择的软件缺陷报告严重性评估

doi:10.19678/j.issn.1000-3428.0053297

计算机工程 ›› 2019, Vol. 45 ›› Issue (8): 80-85. doi: 10.19678/j.issn.1000-3428.0053297

基于特征选择的软件缺陷报告严重性评估

刘文杰, 江贺

大连理工大学软件学院, 辽宁大连 116621

收稿日期:2018-12-03 修回日期:2019-01-11 出版日期:2019-08-15 发布日期:2019-08-08
作者简介:刘文杰(1979-),男,工程师、硕士,主研方向为软件测试、网络工程;江贺,教授、博士、博士生导师。
基金资助:
国家自然科学基金"超启发式算法的多视角分析及应用研究"（61175062）。

Severity Assessment of Software Defect Reports Based on Feature Selection

LIU Wenjie, JIANG He

School of Software Technology, Dalian University of Technology, Dalian, Liaoning 116621, China

Received:2018-12-03 Revised:2019-01-11 Online:2019-08-15 Published:2019-08-08

摘要/Abstract

摘要： 针对Bugzilla缺陷跟踪系统的Eclipse项目软件缺陷报告数据集，使用特征选择和机器学习算法对向量化的原始数据进行特征降维、权重优化等处理，得到数据维度较低的优化数据集，并采用分类算法评估软件缺陷报告严重程度。通过对4种特征选择算法及4种机器学习算法处理结果的交叉对比表明，使用信息增益特征选择算法对原始数据集进行特征优化，并结合多项式贝叶斯算法对优化数据集进行训练与测试，可使软件缺陷报告严重性预测的AUROC值提高至0.767。

关键词: 开源软件, 软件缺陷报告, 特征选择, 机器学习, 严重性评估, 修复率

Abstract: To address the datasets of Eclipse project software defect reports of Bugzilla defect tracking system,feature selection and machine learning algorithms are used to perform feature dimension reduction and weight optimization on vectorized original data to obtain optimized dataset with lower data dimension,and classification algorithms are used to evaluate the severity of software defect reports.The cross-comparison results of the four feature selection algorithms and the four machine learning algorithm results show that the Information Gain(IG) feature selection algorithm is used to perform feature optimization on the original dataset,and the optimized dataset is trained and tested by using the Multinamial Naive Bayes(MNB) algorithm.The AUROC value of the severity prediction of software defect reports can be increased to 0.767.

Key words: opensource software, software defect report, feature selection, machine learning, severity assessment, repair rate

中图分类号:

TP39

刘文杰, 江贺. 基于特征选择的软件缺陷报告严重性评估[J]. 计算机工程, 2019, 45(8): 80-85.

LIU Wenjie, JIANG He. Severity Assessment of Software Defect Reports Based on Feature Selection[J]. Computer Engineering, 2019, 45(8): 80-85.

https://www.ecice06.com/CN/Y2019/V45/I8/80

参考文献 20

[1]	NI Chao,LIU Wangshu,CHEN Xiang,et al.A cluster based feature selection method for cross-project software defect prediction[J].Journal of Computer Science and Technology,2017,32(6):1090-1107.
[2]	RYU D,JIANG J I,BAIK J.A hybrid instance selection using nearest-neighbor for cross-project defect prediction[J].Journal of Computer Science and Technology,2015,30(5):969-980.
[3]	XUAN Jifeng,JIANG He,REN Zhiwei,et al.Automatic BUG triage using semi-supervised text classifica-tion[C]//Proceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering.Washington D.C.,USA:IEEE Press,2010:1-9.
[4]	LAWRENCE F L,SHARMA S K,SISODIA M S.Network intrusion detection by using feature reduction technique[J].International Journal of Advanced Research in Computer Science and Electronics Engineering,2012,1(1):27-32.
[5]	STRATE J D,LAPLANTE P A.A literature review of research in software defect reporting[J].IEEE Transactions on Reliability,2013,62(2):444-454.
[6]	WANG Jie,PLATANIOTIS K,LU J,et al.Kernel quadratic discriminant analysis for small sample size problem[J].Pattern Recognition,2008,41(5):1528-1538.
[7]	MATTER D,KUHN A,NIERSTRASZ O.Assigning BUG reports using a vocabulary-based expertise model of developers[C]//Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories.Washington D.C.,USA:IEEE Press,2009:131-140.
[8]	ZIOU D,HAMRI T,BOUTEMEDJET S.A hybrid probabilistic framework for content-based image retrieval with feature weighting[J].Pattern Recognition,2009,42(7):1511-1519.
[9]	MENZIES T,MARCUS A.Automated severity assessment of software defect reports[C]//Proceedings of IEEE International Conference on Software.New York,USA:ACM Press,2015:1-12.
[10]	LAMKAN A,DEMEYER S,SOETENS Q D,et al.Comparing mining algorithms for predicting the severity of a reported BUG[C]//Proceedings of European Conference on Software.New York,USA:ACM Press,2011:1-8.
[11]	黄小亮,郁抒思,关佶红.基于LDA主题模型的软件缺陷分派方法[J].计算机工程,2011,37(21):46-48.
[12]	ZHANG Tao,CHEN Jiachi,YANG G,et al.Towards more accurate severity prediction and fixer recommendation of software BUGs[J].Journal of Systems and Software,2016,117(C):166-184.
[13]	SHOKRIPOUR R,KASIRUN Z M,ZAMANI S,et al.Automatic BUG assignment using information extraction methods[C]//Proceedings of International Conference on Advanced Computer Science Applications and Technologies.Washington D.C.,USA:IEEE Press,2012:144-149.
[14]	SHOKRIPOUR R,ANVIK J,KASIRUN Z M,et al.Why so complicated? simple term filtering and weighting for location-based BUG report assignment recommendation[C]//Proceedings of the 10th International Workshopon Mining Software Repositories.Washington D.C.,USA:IEEE Press,2013:2-11.
[15]	LIU Wenjie,WANG Shanshan,CHEN Xin,et al.Predicting the severity of BUG reports based on feature selection[J].International Journal of Software Engineering and Knowledge Engineering,2018,28(4):537-558.
[16]	张肖,王利明.一种半监督继承学习软件缺陷预测方法[J].小型微型计算机系统,2018,39(10):2138-2145.
[17]	史小婉,马于涛.一种基于文本分类和评分机制的软件缺陷分配方法[J].计算机科学,2018,45(11):193-198.
[18]	任胜兵,廖湘荡.基于代价敏感支持向量机的软件缺陷预测研究[J].计算机工程与科学,2018,40(10):1787-1795.
[19]	路永和,李焰锋.改进TF-IDF算法的文本特征项权值计算方法[J].图书情报工作,2013,57(3):90-95.
[20]	孙小兵,周澄,杨辉,等.面向软件安全性缺陷的开发者推荐方法[J].软件学报,2018,29(8):2294-2305.

选择文件类型/文献管理软件名称

选择包含的内容

基于特征选择的软件缺陷报告严重性评估

Severity Assessment of Software Defect Reports Based on Feature Selection

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献 20

相关文章 15

编辑推荐

Metrics

本文评价

[1]	张健, 张博. 基于生物入侵的特征选择算法[J]. 计算机工程, 2024, 50(9): 46-53.
[2]	李俊仪, 李向阳, 龙朝勋, 李海燕, 李红松, 余鹏飞. 基于多级区域选择与跨层特征融合的野生菌分类[J]. 计算机工程, 2024, 50(9): 179-188.
[3]	徐明亮, 李芳媛, 马浩然, 何飞. 大规模神经记录的峰电位聚类算法(特邀)[J]. 计算机工程, 2024, 50(6): 1-34.
[4]	李永飞, 李铭洋, 常鑫, 曹可欣. 基于可解释性深度学习的物联网水质监测数据异常检测[J]. 计算机工程, 2024, 50(6): 179-187.
[5]	刘仲民, 严梁. 融合动态特征与注意力的敦煌壁画修复模型[J]. 计算机工程, 2024, 50(5): 342-353.
[6]	孙毅, 王会梅, 鲜明, 向航. Kubeflow异构算力调度策略研究[J]. 计算机工程, 2024, 50(2): 25-32.
[7]	赵洁, 叶文浩, 梁周扬, 陈建新, 董振宁. 基于不一致近邻的模糊粗糙集特征选择[J]. 计算机工程, 2024, 50(1): 110-119.
[8]	杨璇, 马建敏, 赵曼君. 基于邻域互信息的高维时序数据特征选择[J]. 计算机工程, 2023, 49(7): 135-142.
[9]	陈治旭, 靳雁霞, 芦烨, 杨晶, 刘亚变, 史志儒. 基于子图卷积神经网络的多精度服装建模方法[J]. 计算机工程, 2023, 49(4): 174-181.
[10]	刘金硕, 詹岱依, 邓娟, 王丽娜. 基于深度神经网络和联邦学习的网络入侵检测[J]. 计算机工程, 2023, 49(1): 15-21,30.
[11]	俞莎莎, 牛保宁. 基于交易不可信度的比特币非法交易检测[J]. 计算机工程, 2022, 48(8): 166-172.
[12]	葛昕, 邹福泰, 郭万达, 谭越, 李林森. 社交僵尸网络发展综述[J]. 计算机工程, 2022, 48(8): 12-24.
[13]	刘利, 张德生, 肖燕婷. 基于隶属度的模糊加权k近质心近邻算法[J]. 计算机工程, 2022, 48(7): 122-129.
[14]	金海波, 赵欣越. 共形预测框架下的高可靠入侵检测算法[J]. 计算机工程, 2022, 48(7): 130-140.
[15]	艾成豪, 高建华, 黄子杰. 混合特征选择和集成学习驱动的代码异味检测[J]. 计算机工程, 2022, 48(7): 168-176,198.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于特征选择的软件缺陷报告严重性评估

Severity Assessment of Software Defect Reports Based on Feature Selection

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献 20

相关文章 15

编辑推荐

Metrics

本文评价