作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (12): 230-235,248. doi: 10.19678/j.issn.1000-3428.0060389

• 体系结构与软件技术 • 上一篇    下一篇

基于重子节点抽象语法树的软件缺陷预测

黄晓伟1,2, 范贵生1, 虞慧群1, 杨星光1   

  1. 1. 华东理工大学 计算机科学与工程系, 上海 200237;
    2. 上海市计算机软件测评重点实验室, 上海 201112
  • 收稿日期:2020-12-24 修回日期:2021-01-29 发布日期:2021-02-23
  • 作者简介:黄晓伟(1995-),男,硕士研究生,主研方向为软件缺陷预测;范贵生(通信作者),副研究员、博士;虞慧群,教授、博士生导师;杨星光,博士研究生。
  • 基金资助:
    国家自然科学基金(61702334,61772200);上海市浦江人才计划(17PJ1401900);上海市自然科学基金(17ZR1406900,17ZR1429700);华东理工大学教育科研基金(ZH1726108)。

Software Defect Prediction via Heavy Son Node-based Abstract Syntax Tree

HUANG Xiaowei1,2, FAN Guisheng1, YU Huiqun1, YANG Xingguang1   

  1. 1. Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
    2. Key Laboratory of Computer Software Testing and Evaluating, Shanghai 201112, China
  • Received:2020-12-24 Revised:2021-01-29 Published:2021-02-23

摘要: 在实际软件项目开发过程中,软件缺陷预测能辅助测试人员找到项目中可能存在缺陷的位置,并通过抽象语法树(AST)获取项目模块中隐藏的结构和语义信息,此类信息有助于提高缺陷预测精度。提出基于重子节点抽象语法树的缺陷预测方法,在提取节点信息时保留节点的类型信息和对应代码语义的值信息,并使用特殊字符串代替没有值信息的节点。通过树链剖分思想将AST分割为重子节点和轻子节点,优先选择重子节点作为序列化向量中的节点,同时利用深度学习网络学习节点序列中的源代码结构和语言实现软件缺陷预测。实验结果表明,与DFS方法相比,该方法在基于注意力机制的循环神经网络深度学习模型上的F1值和AUC值平均提升约3%和4%,具有更好的缺陷预测效果。

关键词: 软件质量保障, 软件缺陷预测, 代码表征, 抽象语法树, 深度学习

Abstract: In the actual software project development, software defect prediction can assist testers to find possible defects in the project.Through the Abstract Syntax Tree(AST), the hidden structure and semantic information in the project module can be obtained, which helps to improve the accuracy of defect prediction.This paper proposes a defect prediction method using Heavy Son(HS) node-based abstract syntax tree.In node information extraction, the type information of the node and the value information of the corresponding code semantics are retained, and the nodes without value information are replaced with a special string.Then by using the idea of tree chain division, the AST is divided into HS nodes and Light Son(LS) nodes.The HS nodes are selected in preference as the nodes in serialized vectors.At the same time, the deep learning network is used to learn the source code structure and language in the node sequence to realize software defect prediction.Experimental results show that compared with the DFS method, the proposed method improves the F1-measure by 3% and the AUC value by 4%, has a better defect prediction effect.

Key words: software quality assurance, software defect prediction, code representation, Abstract Syntax Tree(AST), deep learning

中图分类号: