Abstract:
This paper proposes a method of program source code similarity measurement. According to the structure feature of the C program language source code, by using the division of function scope, the rules normalize source code. The generated Token sequence is calculated Hash value. It uses the Hash value matching algorithm to measure the program source code similarity. Experimental results show that the accuracy of similar degree can be measured well and run-time efficiency is high.
Key words:
function scope,
code normalization,
Hash value matching,
similarity measurement
摘要: 提出一种程序源代码相似度度量方法,根据C语言程序源代码的结构特点划分函数作用域,采用相关规则对划分后的程序代码进行规格化处理,对生成的Token序列求Hash值,使用散列值匹配算法对程序源代码进行相似度度量。实验结果证明,该方法可提高程序源代码相似度度量精度,且运行效率较高。
关键词:
函数作用域,
代码规格化,
散列值匹配,
相似度度量
CLC Number:
GU Beng, ZHANG Feng, ZHOU Hai-Chao. Method of Program Source Code Similarity Measurement[J]. Computer Engineering, 2012, 38(06): 37-39.
古平, 张锋, 周海涛. 一种程序源代码相似度度量方法[J]. 计算机工程, 2012, 38(06): 37-39.