计算机工程 ›› 2020, Vol. 46 ›› Issue (8): 197-202,209.doi: 10.19678/j.issn.1000-3428.0055054

• 体系结构与软件技术 • 上一篇    下一篇

基于实例过滤与迁移的跨项目缺陷预测方法

范贵生1,2, 刁旭炀1, 虞慧群1, 陈丽琼3   

  1. 1. 华东理工大学 计算机科学与工程系, 上海 200237;
    2. 上海市计算机软件评测重点实验室, 上海 201112;
    3. 上海应用技术大学 计算机科学与信息工程系, 上海 201418
  • 收稿日期:2019-05-30 修回日期:2019-08-06 发布日期:2019-08-14
  • 作者简介:范贵生(1980-),男,副教授、博士,主研方向为可信计算;刁旭炀(通信作者),硕士研究生;虞慧群,教授、博士生导师;陈丽琼,讲师。
  • 基金项目:
    国家自然科学基金(61702334,61772200);上海市浦江人才资助计划(17PJ1401900);上海市自然科学基金(17ZR1406900,17ZR1429700);华东理工大学教育科研基金(ZH1726108)。

Cross-Project Defect Prediction Method Based on Instance Filtering and Transfer

FAN Guisheng1,2, DIAO Xuyang1, YU Huiqun1, CHEN Liqiong3   

  1. 1. Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
    2. Shanghai Key Laboratory of Computer Software Testing and Evaluating, Shanghai 201112, China;
    3. Department of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China
  • Received:2019-05-30 Revised:2019-08-06 Published:2019-08-14

摘要: 在跨项目软件缺陷预测中,人工采集标注的原始数据集通常包含噪声数据,并且源项目与目标项目之间的数据存在较大的分布差异性。针对该问题,提出一种两阶段跨项目缺陷预测方法CLNI-KMM。在实例过滤阶段,基于CLNI算法过滤噪声实例。在实例迁移阶段,采用KMM算法调整源项目中实例的训练权重,并结合目标项目中的少量标注实例建立软件缺陷预测模型。实验结果表明,与经典的跨项目软件缺陷预测方法TCA、TNB和NNFilter相比,CLNI-KMM方法预测性能较优,并且具有较强的稳定性。

关键词: 跨项目缺陷预测, 噪声数据, 分布差异, 实例过滤, 实例迁移

Abstract: In cross-project software defect prediction,original datasets collected and labeled by humans are often corrupted by noisy,and large distribution differences exist between data of the source project and target project.To address the problem,this paper proposes a two-stage cross-project defect prediction method called CLNI-KMM.During the instance filtering stage,noisy instances are filtered by using the CLNI method.During the instance transfer stage,the KMM algorithm for instance transfer is used to adjust the training weights of instances in the source project.On this basis,a software defect prediction model is built by combining the training data with a small ratio of labeled instances in the target project.Experimental results show that compared with classical cross-project software defect prediction methods TCA,TNB and NNFilter,the proposed method has better prediction performance.Meanwhile,it has stronger stability.

Key words: Cross-Project Defect Prediction(CPDP), noisy data, distribution difference, instance filtering, instance transfer

中图分类号: