基于实例过滤与迁移的跨项目缺陷预测方法

doi:10.19678/j.issn.1000-3428.0055054

计算机工程 ›› 2020, Vol. 46 ›› Issue (8): 197-202,209. doi: 10.19678/j.issn.1000-3428.0055054

基于实例过滤与迁移的跨项目缺陷预测方法

范贵生^1,2, 刁旭炀¹, 虞慧群¹, 陈丽琼³

1. 华东理工大学计算机科学与工程系, 上海 200237;
2. 上海市计算机软件评测重点实验室, 上海 201112;
3. 上海应用技术大学计算机科学与信息工程系, 上海 201418

收稿日期:2019-05-30 修回日期:2019-08-06 发布日期:2019-08-14
作者简介:范贵生(1980-),男,副教授、博士,主研方向为可信计算;刁旭炀(通信作者),硕士研究生;虞慧群,教授、博士生导师;陈丽琼,讲师。
基金资助:
国家自然科学基金（61702334，61772200）；上海市浦江人才资助计划（17PJ1401900）；上海市自然科学基金（17ZR1406900，17ZR1429700）；华东理工大学教育科研基金（ZH1726108）。

Cross-Project Defect Prediction Method Based on Instance Filtering and Transfer

FAN Guisheng^1,2, DIAO Xuyang¹, YU Huiqun¹, CHEN Liqiong³

1. Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
2. Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai 201112, China;
3. Department of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China

Received:2019-05-30 Revised:2019-08-06 Published:2019-08-14

摘要/Abstract

摘要： 在跨项目软件缺陷预测中，人工采集标注的原始数据集通常包含噪声数据，并且源项目与目标项目之间的数据存在较大的分布差异性。针对该问题，提出一种两阶段跨项目缺陷预测方法CLNI-KMM。在实例过滤阶段，基于CLNI算法过滤噪声实例。在实例迁移阶段，采用KMM算法调整源项目中实例的训练权重，并结合目标项目中的少量标注实例建立软件缺陷预测模型。实验结果表明，与经典的跨项目软件缺陷预测方法TCA、TNB和NNFilter相比，CLNI-KMM方法预测性能较优，并且具有较强的稳定性。

关键词: 跨项目缺陷预测, 噪声数据, 分布差异, 实例过滤, 实例迁移

Abstract: In cross-project software defect prediction,original datasets collected and labeled by humans are often corrupted by noisy,and large distribution differences exist between data of the source project and target project.To address the problem,this paper proposes a two-stage cross-project defect prediction method called CLNI-KMM.During the instance filtering stage,noisy instances are filtered by using the CLNI method.During the instance transfer stage,the KMM algorithm for instance transfer is used to adjust the training weights of instances in the source project.On this basis,a software defect prediction model is built by combining the training data with a small ratio of labeled instances in the target project.Experimental results show that compared with classical cross-project software defect prediction methods TCA,TNB and NNFilter,the proposed method has better prediction performance.Meanwhile,it has stronger stability.

Key words: Cross-Project Defect Prediction(CPDP), noisy data, distribution difference, instance filtering, instance transfer

中图分类号:

TP18

范贵生, 刁旭炀, 虞慧群, 陈丽琼. 基于实例过滤与迁移的跨项目缺陷预测方法[J]. 计算机工程, 2020, 46(8): 197-202,209.

FAN Guisheng, DIAO Xuyang, YU Huiqun, CHEN Liqiong. Cross-Project Defect Prediction Method Based on Instance Filtering and Transfer[J]. Computer Engineering, 2020, 46(8): 197-202,209.

http://www.ecice06.com/CN/Y2020/V46/I8/197

图/表 6

20200819135632

20200819135635

20200819135638

20200819135643

20200819135645

20200819135648

参考文献

[1] MADHAVJI N H,MIRANSKYY A,KONTOGIANNIS K.Big picture of big data software engineering:with example researchchallenges[C]//Proceedings of the 1stInternational Workshop on BIG Data Software Engineering.Washington D.C.,USA:IEEE Press,2015:11-14.
[2] LI Zhiqiang,JING Xiaoyuan,ZHU Xiaoke.Progress on approaches to software defect prediction[J].IET Software,2018,12(3):161-175.
[3] CHEN Xiang,GU Qing,LIU Wangshu,et al.Survey of static software defect prediction[J].Journal of Software,2016,27(1):1-25.(in Chinese) 陈翔,顾庆,刘望舒,等.静态软件缺陷预测方法研究[J].软件学报,2016,27(1):1-25.
[4] TURHAN B,MENZIES T,BENER A B,et al.On the relative value of cross-company and within-company data for defect prediction[J].Empirical Software Engineering,2009,14(5):540-578.
[5] CHEN Xiang,WANG Liping,GU Qing.A survey on cross-project software defect prediction methods[J].Chinese Journal of Computers,2018,41(1):254-274.(in Chinese) 陈翔,王莉萍,顾庆.跨项目软件缺陷预测方法研究综述[J].计算机学报,2018,41(1):254-274.
[6] KIM S,ZHANG H,WU R,et al.Dealing with noise in defect prediction[C]//Proceedings of the 33rd International Conference on Software Engineering.Waikiki,USA:[s.n.],2011:481-490.
[7] HUANG J,GRETTON A,BORGWARDT K,et al.Correcting sample selection Bias by unlabeled data[M]//SCHÖLKOPF B,PLATT J,HOFMANN T.Advances in Neural Information Processing Systems19:Proceedings of the2006Conference.[S.l.]:MIT Press,2007:601-608.
[8] PAN S J,TSANG I W,KWOK J T,et al.Domain adaptation via transfer component analysis[J].IEEE Transactions on Neural Networks,2011,22(2):199-210.
[9] HOSSEINI S,TURHAN B,MÄNTYLÄ M J I,et al.A benchmark study on the effectiveness of search-baseddata selection and feature selection for cross project defect prediction[J].Information and Software Technology,2018,95:296-312.
[10] MA Ying,LUO Guangchun,ZENG Xue,et al.Transfer learning for cross-company software defect prediction[J].Information and Software Technology,2012,54(3):248-256.
[11] WANG Liping,CHEN Xiang,WANG Qiuping,et al.Box-Cox transformation based ensemble learning approach for cross-project software defect prediction[J].Application Research of Computers,2017,34(7):2023-2026.(in Chinese) 王莉萍,陈翔,王秋萍,等.基于Box-Cox转换的集成跨项目软件缺陷预测方法[J].计算机应用研究,2017,34(7):2023-2026.
[12] FENG Z,KEIVANLOO I,YING Z.Data transformation in cross-project defect prediction[J].Empirical Software Engineering,2017,22(6):3186-3218.
[13] YANG Jie,FAN Guisheng,YU Huiqun.Multi-source heterogeneous software defect prediction method[J].Journal of Chinese Computer Systems,2019,40(4):851-855.(in Chinese) 杨杰,范贵生,虞慧群.一种多源异构软件缺陷预测方法[J].小型微型计算机系统,2019,40(4):851-855.
[14] YU Xiao,LIU Jin,FU Mandi,et al.A multi-source tradaboost approach for cross-company defect prediction[C]//Proceedings of the 28th International Conference on Software Engineering and Knowledge Engineering.Redwood City,USA:[s.n.],2016:237-242.
[15] XIA X,LO D,PAN S J,et al.Hydra:massively compositional model for cross-project defect prediction[J].IEEE Transactions on Software Engineering,2016,42(10):977-998.
[16] QIU Shaojian,LU Lu,JIANG Siyu.Multiple-components weights model for cross-project software defect prediction[J].IET Software,2018,12(4):345-355.
[17] ZADROZNY B.Learning and evaluating classifiers under sample selection Bias[C]//Proceedings of International Conference on Machine Learning.New York,USA:ACM Press,2004:1-5.
[18] SHIMODAIRA H.Improving predictive inference under covariate shift by weighting the log-likelihood function[J].Journal of Statistical Planning and Inference,2000,90(2):227-244.
[19] DAI Wenyuan,YANG Qiang,XUE Gonggui,et al.Boosting for transfer learning[C]//Proceedings of International Conference on Machine Learning.New York,USA:ACM Press,2007:1-5.
[20] BORGWARDT K M,GRETTON A,RASCH M J,et al.Integrating structured biological data by kernel maximum mean discrepancy[J].Bioinformatics,2006,22(14):49-57.
[21] JURECZKO M,MADEYSKI L.Towards identifying software project clusters with regard to defect prediction[C]//Proceedings of the 6th International Conference on Predictive Models in Software Engineering.Turin,Italy:[s.n.],2010:1-10.
[22] CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.
[23] FRIEDMAN M.A comparison of alternative tests of significance for the problem of m rankings[J].Journal of Applied Statistics,1940,11(1):86-92.
[24] JANEZ D,DALE S.Statistical comparisons of classifiers over multiple data sets[J].Journal of Machine Learning Research,2006,7:1-30.
[25] REYES O,ALTALHI A H,VENTURA S.Statistical comparisons of active learning strategies over multiple datasets[J].Knowledge-Based Systems,2018,145:274-288.
[26] ELLIOTT A C,HYNAN L S J C M,BIOMEDICINE P I.A SAS^® macro implementation of a multiple comparison post hoc test for a Kruskal-Wallis analysis[J].Computer Methods and Programs in Biomedicine,2011,102(1):75-80.

选择文件类型/文献管理软件名称

选择包含的内容

基于实例过滤与迁移的跨项目缺陷预测方法

Cross-Project Defect Prediction Method Based on Instance Filtering and Transfer

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献

相关文章 4

编辑推荐

Metrics

本文评价

[1]	王丽娟, 李可爱, 郝志峰, 蔡瑞初, 尹明. 基于低秩表示的鲁棒回归模型[J]. 计算机工程, 2020, 46(1): 74-79,86.
[2]	冯旭鹏,马震,谢波,刘利军,黄青松. 基于评价修饰分布差的评论文本倾向性识别方法[J]. 计算机工程, 2016, 42(10): 176-180,186.
[3]	张衡,金鑫,秦晓倩. 受约束kNN回归在噪声数据中的应用[J]. 计算机工程, 2015, 41(12): 275-279,287.
[4]	梁亚玲, 杜明辉. 基于Lab色度空间a分量的唇部提取方法[J]. 计算机工程, 2011, 37(3): 19-21,24.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于实例过滤与迁移的跨项目缺陷预测方法

Cross-Project Defect Prediction Method Based on Instance Filtering and Transfer

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献

相关文章 4

编辑推荐

Metrics

本文评价