Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (2): 223-237. doi: 10.19678/j.issn.1000-3428.0069687

• Computer Architecture and Software Technology • Previous Articles     Next Articles

Dead Code Detection Method Based on Convolutional Neural Network and Long Short-Term Memory

SUN Yikang, GAO Jianhua*()   

  1. Department of Computer Science and Technology, Shanghai Normal University, Shanghai 200234, China
  • Received:2024-04-02 Online:2025-02-15 Published:2024-08-05
  • Contact: GAO Jianhua

基于卷积神经网络和长短期记忆的死代码检测方法

孙义康, 高建华*()   

  1. 上海师范大学计算机科学与技术系, 上海 200234
  • 通讯作者: 高建华
  • 基金资助:
    国家自然科学基金(61672355)

Abstract:

Dead code is a code smell that leads to the gradual deterioration of software quality. Traditional dead code detection methods primarily rely on static analysis techniques, code structure metrics, and heuristic rules. These methods vary considerably among developers. Moreover, these methods pay limited attention to the textual information and overlook the execution context of the source code, leading to significant limitations. To address these challenges, an innovative approach for detecting dead code is designed by integrating a Convolutional Neural Network(CNN) and Long Short-Term Memory(LSTM). Textual and code metric information is integrated in this method to enhance the accuracy of dead code detection. First, dead code instances in an application are identified using tools such as the DUM-Tool and manually verified and labeled. The source code's textual information is then obtained by traversing the abstract syntax tree in a depth-first manner, matching label values with textual information, and extracting code metric information using CK code metric extraction tools. The textual information is transformed into word vectors using Word2Vec, and a CNN is utilized to extract features from the code metric information. Finally, the combination of these features forms a dataset for dead code detection; this dataset is subsequently trained using LSTM and classified using a Sigmoid function. The experimental results reveal that the integration of textual and metric information facilitates effective dead code detection, achieving a maximum F1 value improvement of 12.58 percentage point compared with traditional detection methods.

Key words: dead code, deep learning, textual information, code metrics, feature extraction

摘要:

死代码是一种不良代码异味, 会导致软件质量逐渐衰退。传统的死代码检测方法主要依赖于静态分析技术、代码结构的度量以及启发式规则, 这些方法在开发者之间存在高度差异, 且对源代码文本信息关注较少, 忽略代码在实际执行过程中的情况, 存在较大的局限性。针对以上问题, 设计一种新型死代码检测方法, 并采用基于卷积神经网络和长短期记忆相结合的技术, 其主要思路是将代码文本信息和代码度量信息相结合, 提高死代码检测的准确性。首先使用DUM-Tool等工具并结合人工以确定应用程序中的死代码实例进行死代码标记, 以深度优先遍历抽象语法树获取源代码的文本信息, 将标签值与文本信息相匹配, 再使用CK代码度量提取工具获取源代码的代码度量信息。然后通过Word2Vec将文本信息转化为词向量, 使用卷积神经网络提取代码度量信息的特征, 将两者拼接得到死代码检测的数据集。最后使用长短期记忆网络对数据集进行训练, 再通过Sigmoid函数进行分类。实验结果表明, 将代码文本信息和度量信息相结合可以有效实现死代码的检测, 与传统的检测方法相比, 平均F1值最高提升12.58百分点。

关键词: 死代码, 深度学习, 文本信息, 代码度量, 特征提取