Dead Code Detection Method Based on Convolutional Neural Network and Long Short-Term Memory

doi:10.19678/j.issn.1000-3428.0069687

Abstract

Abstract:

Dead code is a code smell that leads to the gradual deterioration of software quality. Traditional dead code detection methods primarily rely on static analysis techniques, code structure metrics, and heuristic rules. These methods vary considerably among developers. Moreover, these methods pay limited attention to the textual information and overlook the execution context of the source code, leading to significant limitations. To address these challenges, an innovative approach for detecting dead code is designed by integrating a Convolutional Neural Network(CNN) and Long Short-Term Memory(LSTM). Textual and code metric information is integrated in this method to enhance the accuracy of dead code detection. First, dead code instances in an application are identified using tools such as the DUM-Tool and manually verified and labeled. The source code's textual information is then obtained by traversing the abstract syntax tree in a depth-first manner, matching label values with textual information, and extracting code metric information using CK code metric extraction tools. The textual information is transformed into word vectors using Word2Vec, and a CNN is utilized to extract features from the code metric information. Finally, the combination of these features forms a dataset for dead code detection; this dataset is subsequently trained using LSTM and classified using a Sigmoid function. The experimental results reveal that the integration of textual and metric information facilitates effective dead code detection, achieving a maximum F1 value improvement of 12.58 percentage point compared with traditional detection methods.

Key words: dead code, deep learning, textual information, code metrics, feature extraction

摘要：

死代码是一种不良代码异味, 会导致软件质量逐渐衰退。传统的死代码检测方法主要依赖于静态分析技术、代码结构的度量以及启发式规则, 这些方法在开发者之间存在高度差异, 且对源代码文本信息关注较少, 忽略代码在实际执行过程中的情况, 存在较大的局限性。针对以上问题, 设计一种新型死代码检测方法, 并采用基于卷积神经网络和长短期记忆相结合的技术, 其主要思路是将代码文本信息和代码度量信息相结合, 提高死代码检测的准确性。首先使用DUM-Tool等工具并结合人工以确定应用程序中的死代码实例进行死代码标记, 以深度优先遍历抽象语法树获取源代码的文本信息, 将标签值与文本信息相匹配, 再使用CK代码度量提取工具获取源代码的代码度量信息。然后通过Word2Vec将文本信息转化为词向量, 使用卷积神经网络提取代码度量信息的特征, 将两者拼接得到死代码检测的数据集。最后使用长短期记忆网络对数据集进行训练, 再通过Sigmoid函数进行分类。实验结果表明, 将代码文本信息和度量信息相结合可以有效实现死代码的检测, 与传统的检测方法相比, 平均F1值最高提升12.58百分点。

关键词: 死代码, 深度学习, 文本信息, 代码度量, 特征提取

SUN Yikang, GAO Jianhua. Dead Code Detection Method Based on Convolutional Neural Network and Long Short-Term Memory[J]. Computer Engineering, 2025, 51(2): 223-237.

孙义康, 高建华. 基于卷积神经网络和长短期记忆的死代码检测方法[J]. 计算机工程, 2025, 51(2): 223-237.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069687

https://www.ecice06.com/EN/Y2025/V51/I2/223

Figures/Tables 17

Fig.1 Example code

Fig.2 Abstract syntax tree of example code

Fig.3 CBOW model structure

Fig.4 Skip-Gram model structure

Fig.5 LSTM network structure

Fig.6 The construction procedure of DCD-DL method

Fig.7 Padding and truncation examples

Fig.8 Feature extraction procedure for code metrics information using CNN

Fig.9 LSTM classification model

Fig.10 Distribution box plot of F1 values for each classifier

References 38

1	ROMANO S , VENDOME C , SCANNIELLO G , et al. A multi-study investigation into dead code. IEEE Transactions on Software Engineering, 2020, 46 (1): 71- 99. doi: 10.1109/TSE.2018.2842781
2	KUDRJAVETS G, RASTOGI A, THOMAS J, et al. On quantifying the benefits of dead code removal[C]//Proceedings of IEEE International Conference on Software Maintenance and Evolution. Washington D. C., USA: IEEE Press, 2022: 563-563.
3	FOWLER M , BECK K . Refactoring: improving the design of existing code. New York, USA: Addison-Wesley Professional, 2018.
4	ROMANO S, SCANNIELLO G. DUM-tool[C]//Proceedings of IEEE International Conference on Software Maintenance and Evolution. Washington D. C., USA: IEEE Press, 2015: 339-341.
5	CHEN K, RAJLICH V. Case study of feature location using dependence graph[C]//Proceedings of the 8th International Workshop on Program Comprehension. Washington D. C., USA: IEEE Press, 2000: 241-247.
6	OBBINK N G, MALAVOLTA I, SCOCCIA G L, et al. An extensible approach for taming the challenges of JavaScript dead code elimination[C]//Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. Washington D. C., USA: IEEE Press, 2018: 291-401.
7	FARD A M, MESBAH A. JSNOSE: detecting JavaScript code smells[C]//Proceedings of the 13th IEEE International Working Conference on Source Code Analysis and Manipulation. Washington D. C., USA: IEEE Press, 2013: 116-125.
8	GUPTA A , SURI B , LAMBA L . Tracing bad code smells behavior using machine learning with software metrics. New York, USA: John Wiley & Sons, Ltd., 2021.
9	KAUR I , KAUR A . A novel four-way approach designed with ensemble feature selection for code smell detection. IEEE Access, 2021, 9, 8695- 8707. doi: 10.1109/ACCESS.2021.3049823
10	艾成豪, 高建华, 黄子杰. 混合特征选择和集成学习驱动的代码异味检测. 计算机工程, 2022, 48 (7): 168-176, 198. doi: 10.19678/j.issn.1000-3428.0062165
	AI C H , GAO J H , HUANG Z J . Code smell detection driven by hybrid feature selection and ensemble learning. Computer Engineering, 2022, 48 (7): 168-176, 198. doi: 10.19678/j.issn.1000-3428.0062165
11	AZEEM M I , PALOMBA F , SHI L , et al. Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Information and Software Technology, 2019, 108, 115- 138. doi: 10.1016/j.infsof.2018.12.009
12	NIZAM A, AVAR M Y, ADAŞ Ö K, et al. Detecting code smell with a deep learning system[C]//Proceedings of Innovations in Intelligent Systems and Applications Conference. Washington D. C., USA: IEEE Press, 2023: 1-5.
13	TARWANI S, CHUG A. Application of deep learning models for code smell prediction[C]//Proceedings of the 10th International Conference on Reliability, Infocom Technologies and Optimization. Washington D. C., USA: IEEE Press, 2022: 1-5.
14	FAWAZ O, AMAAN M, SAHU S, et al. Experimentation of code smells using deep learning techniques[C]// Proceedings of the 6th International Conference on Contemporary Computing and Informatics. Washington D. C., USA: IEEE Press, 2023: 369-373.
15	NANADANI H, SAAD M, SHARMA T. Calibrating deep learning-based code smell detection using human feedback[C]//Proceedings of the 23rd IEEE International Working Conference on Source Code Analysis and Manipulation. Washington D. C., USA: IEEE Press, 2023: 37-48.
16	SINGH R , BINDAL A , et al. Long method and long parameter list code smells detection using functional and semantic characteristics. International Journal of Recent Technology and Engineering, 2020, 8 (6): 2223- 2232.
17	DI NUCCI D, PALOMBA F, TAMBURRI D A, et al. Detecting code smells using machine learning techniques: are we there yet?[C]//Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. Washington D. C., USA: IEEE Press, 2018: 612-621.
18	SHARMA T , EFSTATHIOU V , LOURIDAS P , et al. Code smell detection by deep direct-learning and transfer-learning. Journal of Systems and Software, 2021, 176, 110936. doi: 10.1016/j.jss.2021.110936
19	HADJ-KACEM M, BOUASSIDA N. Deep representation learning for code smells detection using variational auto-encoder[C]//Proceedings of International Joint Conference on Neural Networks. Budapest, Hungary: IEEE Press, 2019: 1-8.
20	MERZAH B M, SELCUK Y E. Metric based detection of refused bequest code smell[C]//Proceedings of the 9th International Conference on Computational Intelligence and Communication Networks. Washington D. C., USA: IEEE Press, 2017: 53-57.
21	OLIVEIRA A, SOUSA L, OIZUMI W, et al. On the prioritization of design-relevant smelly elements: a mixed-method, multi-project study[C]//Proceedings of XⅢ Brazilian Symposium on Software Components, Architectures, and Reuse. New York, USA: ACM Press, 2019: 83-92.
22	宇通, 高建华. LightGBM融合CFS的开发者感知代码异味强度预测模型研究. 小型微型计算机系统, 2022, 43 (12): 2667- 2674.
	YU T , GAO J H . Research on developer perceived code smell intensity prediction model based on LightGBM and CFS. Journal of Chinese Computer Systems, 2022, 43 (12): 2667- 2674.
23	SHARMA T. Multi-faceted code smell detection at scale using DesigniteJava2.0[C]//Proceedings of the 21st International Conference on Mining Software Repositories. New York, USA: ACM Press, 2024: 284-288.
24	CHIDAMBER S R , KEMERER C F . A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 1994, 20 (6): 476- 493. doi: 10.1109/32.295895
25	SCANNIELLO G. An investigation of object-oriented and code-size metrics as dead code predictors[C]//Proceedings of the 40th EUROMICRO Conference on Software Engineering and Advanced Applications. Washington D. C., USA: IEEE Press, 2014: 392-397.
26	BASILI V R , BRIAND L C , MELO W L . A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering, 1996, 22 (10): 751- 761. doi: 10.1109/32.544352
27	ANICHE M. Java code metrics calculator[EB/OL]. [2024-03-01]. https://github.com/mauricioaniche/ck.
28	徐红, 矫桂娥, 张文俊, 等. 基于卷积神经网络的结构化非平衡数据分类算法. 计算机工程, 2023, 49 (2): 81- 89. doi: 10.19678/j.issn.1000-3428.0063871
	XU H , JIAO G E , ZHANG W J , et al. Classification algorithm for structured imbalanced data based on convolutional neural network. Computer Engineering, 2023, 49 (2): 81- 89. doi: 10.19678/j.issn.1000-3428.0063871
29	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of the 1st International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2013: 246-257.
30	WANG X, ZHANG Y Z, ZHAO L, et al. Dead code detection method based on program slicing[C]//Proceedings of International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. Nanjing, China: [s. n. ], 2017: 155-158.
31	CHEN Y F , GANSNER E R , KOUTSOFIOS E . A C++ data model supporting reachability analysis and dead code detection. IEEE Transactions on Software Engineering, 1998, 24 (9): 682- 694. doi: 10.1109/32.713323
32	DAVIS I J, GODFREY M W, HOLT R C, et al. Analyzing assembler to eliminate dead functions: an industrial experience[C]//Proceedings of the 16th European Conference on Software Maintenance and Reengineering. Szeged, Hungary: IEEE Press, 2012: 467-470.
33	刘昕炜, 陶传奇. 一种静态分析与知识图谱结合的Java冗余代码检测方法. 计算机科学, 2023, 50 (3): 65- 71.
	LIU X W , TAO C Q . Method of Java redundant code detection based on static analysis and knowledge graph. Computer Science, 2023, 50 (3): 65- 71.
34	MALAVOLTA I , NIRGHIN K , SCOCCIA G L , et al. JavaScript dead code identification, elimination, and empirical assessment. IEEE Transactions on Software Engineering, 2023, 49 (7): 3692- 3714. doi: 10.1109/TSE.2023.3267848
35	BOOMSMA H, HOSTNET B V, GROSS H G. Dead code elimination for web systems written in PHP: lessons learned from an industry case[C]//Proceedings of the 28th IEEE International Conference on Software Maintenance. Trento, Italy: IEEE Press, 2012: 511-515.
36	张欣, 翟正利, 姚路遥. 基于CNN和LSTM混合模型的中文新闻文本分类. 计算机与数字工程, 2023, 51 (7): 1540-1543, 1573. doi: 10.3969/j.issn.1672-9722.2023.07.018
	ZHANG X , ZHAI Z L , YAO L Y . Text classification of Chinese news based on CNN and LSTM hybrid model. Computer & Digital Engineering, 2023, 51 (7): 1540-1543, 1573. doi: 10.3969/j.issn.1672-9722.2023.07.018
37	KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE Press, 2015: 3128-3137.
38	ROMANO S, SCANNIELLO G. Exploring the use of rapid type analysis for detecting the dead method smell in Java code[C]//Proceedings of the 44th Euromicro Conference on Software Engineering and Advanced Applications. Prague, Czech Republic: [s. n. ], 2018: 167-174.

[1]	HU Chaoju, GUO Fengyi. MODF Port State Detection Algorithm Based on Improved YOLOv7 [J]. Computer Engineering, 2025, 51(2): 78-85.
[2]	SUN Haomiao, LI Zongmin, XIAO Qian, SUN Wenjie, ZHANG Wenxin. AI-Curling: An On-Site Curling Analysis and Decision-Making Method [J]. Computer Engineering, 2025, 51(2): 102-110.
[3]	XU Ming, QU Taipeng, JIANG Yanji. Improved YOLOv7 Traffic Sign Detection Algorithm in Complex Scenarios [J]. Computer Engineering, 2025, 51(2): 335-343.
[4]	ZHOU Yu, XIE Wei, Kwong Tak Wu, JIANG Jianmin. Reconstruction of Video Snapshot Compressive Imaging Based on Triple Self-Attention [J]. Computer Engineering, 2025, 51(1): 20-30.
[5]	HU Shenglong, CHEN Bin, ZHANG Kaihua, SONG Huihui. Co-Saliency Object Detection Enhanced by Scene Structure Knowledge [J]. Computer Engineering, 2025, 51(1): 31-41.
[6]	ZHANG Xinbo, ZHANG Xueying, HUANG Lixia, CHEN Guijun. Classification Algorithm and Application Based on Semi-Supervised Deep Auto-Encoder Network [J]. Computer Engineering, 2025, 51(1): 71-80.
[7]	YU Yongtao, SUN Ao, LI Ang, ZHU Linlin. Optimization Method for Classifier Output Repeatability Based on Siamese Networks [J]. Computer Engineering, 2025, 51(1): 118-127.
[8]	ZHANG Huiying, SHENG Wenshun. Improved Algorithm for Facial Age Recognition Based on Label Adaptation [J]. Computer Engineering, 2025, 51(1): 174-181.
[9]	YANG Hongju, JI Chang. Research on Learning-Driven Image Compression Algorithm [J]. Computer Engineering, 2025, 51(1): 190-197.
[10]	WANG Xiaolu, WEN Jianrong. Human Action Recognition Method Based on Action-Time Perception [J]. Computer Engineering, 2025, 51(1): 216-224.
[11]	HUO Jiuyuan, SU Hongrui, WU Zeyu, WANG Tingjuan. Road Traffic Small Target Vehicle Detection Algorithm Based on Improved YOLOv8 [J]. Computer Engineering, 2025, 51(1): 246-257.
[12]	WANG Qian, ZHANG Junhua, WANG Zetong, LI Bo. X2S-Net: Three-Dimensional Reconstruction of Spine Based on Biplanar X-Rays [J]. Computer Engineering, 2025, 51(1): 277-286.
[13]	YI Peng, YANG Ye, YAN Shijia. Research of Fast Transfer Learning of sEMG Based on MPCNN Model for Gesture Recognition Applications [J]. Computer Engineering, 2025, 51(1): 304-311.
[14]	LIU Zhaowei, FANG Yanhong, ZHENG Mingyu, SUO Bin. Lung Disease Diagnosis Method Based on Attention Mechanism and Multi-tasking [J]. Computer Engineering, 2025, 51(1): 332-342.
[15]	WEI Wei, DING Xiangxiang, GUO Mengxing, YANG Zhao, LIU Hui. Review of Text Similarity Calculation Methods [J]. Computer Engineering, 2024, 50(9): 18-32.

Please choose a citation manager

Content to export