Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2012, Vol. 38 ›› Issue (06): 37-39.

• Networks and Communications • Previous Articles     Next Articles

Method of Program Source Code Similarity Measurement

GU Ping, ZHANG Feng, ZHOU Hai-tao   

  1. (College of Computer Science, Chongqing University, Chongqing 400044, China)
  • Received:2011-08-11 Online:2012-03-20 Published:2012-03-20

一种程序源代码相似度度量方法

古 平,张 锋,周海涛   

  1. (重庆大学计算机学院,重庆 400044)
  • 作者简介:古 平(1976-),男,副教授、博士,主研方向:自然语言处理,机器学习;张 锋、周海涛,硕士
  • 基金资助:
    中央高校基本科研业务费科研专项基金资助项目(CDJZR 10180008)

Abstract: This paper proposes a method of program source code similarity measurement. According to the structure feature of the C program language source code, by using the division of function scope, the rules normalize source code. The generated Token sequence is calculated Hash value. It uses the Hash value matching algorithm to measure the program source code similarity. Experimental results show that the accuracy of similar degree can be measured well and run-time efficiency is high.

Key words: function scope, code normalization, Hash value matching, similarity measurement

摘要: 提出一种程序源代码相似度度量方法,根据C语言程序源代码的结构特点划分函数作用域,采用相关规则对划分后的程序代码进行规格化处理,对生成的Token序列求Hash值,使用散列值匹配算法对程序源代码进行相似度度量。实验结果证明,该方法可提高程序源代码相似度度量精度,且运行效率较高。

关键词: 函数作用域, 代码规格化, 散列值匹配, 相似度度量

CLC Number: