作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (10): 125-131. doi: 10.19678/j.issn.1000-3428.0058958

• 网络空间安全 • 上一篇    下一篇

基于分层特征的代码克隆检测方法

张冬梅, 陈永乐, 杨玉丽   

  1. 太原理工大学 信息与计算机学院, 山西 晋中 030600
  • 收稿日期:2020-07-16 修回日期:2020-09-30 发布日期:2020-10-21
  • 作者简介:张冬梅(1995-),女,硕士研究生,主研方向为信息安全;陈永乐(通信作者),副教授、博士;杨玉丽,讲师、博士。
  • 基金资助:
    山西省重点研发计划(201903D121121);山西省自然科学青年基金面上项目(201901D211076)。

Code Clone Detection Method Based on Hierarchical Feature

ZHANG Dongmei, CHEN Yongle, YANG Yuli   

  1. College of Information and Computer, Taiyuan University of Technology, Jinzhong, Shanxi 030600, China
  • Received:2020-07-16 Revised:2020-09-30 Published:2020-10-21

摘要: 针对现有代码克隆检测方法通常存在标记表示单一而抽象语法树构造复杂的问题,提出一种结合分层特征的代码克隆检测方法。使用双层双向长短时记忆网络提取行级和全局代码层次的深层语义信息,挖掘目标代码的语义特征。引入注意力机制调整重要标记及代码行的影响权重,增强语义形式的代码克隆检测效果,并采用softmax分类器识别克隆代码。实验结果表明,该方法的召回率和精确度分别为91%和97%,相比NICAD、CCIS、CCLearner方法对于复杂语义形式的克隆代码具有更好的检测效果。

关键词: 标记转换, 分层特征, 双向长短时记忆网络, 注意力机制, 代码克隆检测

Abstract: The existing code clone detection methods usually have the problem of single mark representation and complex abstract syntax tree structure.To address the problem, a code clone detection method is proposed based on hierarchical features.The method employs two-layer Bi-directional Long Short-Term Memory(Bi-LSTM) networks to extract deeper semantic information at the line level and global code level respectively.On this basis, the semantic features of the target code are mined.Then the attention mechanism is introduced to adjust the influence weight of important tokens and code lines, and thus enhance the performance of code clone detection for complex semantics.Finally, the softmax classifier is used to determine whether the target code is cloned.Experimental results show that the proposed method displays recall rate of 91% and precision of 97%, providing better performance than the NICAD, CCIS and CCLearner methods in code clone detection for complex semantics.

Key words: token conversion, hierarchical feature, Bi-directional Long Short-Term Memory(Bi-LSTM) network, attention mechanism, code clone detection

中图分类号: