计算机工程 ›› 2010, Vol. 36 ›› Issue (4): 45-46.doi: 10.3969/j.issn.1000-3428.2010.04.016

• 软件技术与数据库 • 上一篇    下一篇

程序代码相似度度量的研究与实现

于海英   

  1. (内蒙古财经学院计算机信息管理学院,呼和浩特 010070)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-02-20 发布日期:2010-02-20

Research and Implementation of Program Code Similarity Measurement

YU Hai-ying   

  1. (College of Computer Information Management, Inner Mongolia Finance and Economics College, hohhot 010070)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-02-20 Published:2010-02-20

摘要: 针对程序代码相似度的度量问题,提出一种属性计数和结构度量相结合的方法,通过统计程序源代码的操作符和操作数个数,产生Halstead长度、Halstead词汇和Halstead容量3个程序的特征向量,利用向量夹角的余弦计算属性相似度,采用最长公共子序列算法获取结构相似度,从而衡量程序对间的相似程度。实验结果表明,该方法能够有效检测出学生作业中的相似程序代码。

关键词: 属性计数, 结构度量, 程序代码相似度

Abstract: Aiming at the problem of program code similarity measurement, a combined method of attribute counting and structure metrics is proposed. Attribute counting produces Halstead length, Halstead vocabulary and Halstead volume which constitute feature vector by counting the operator and operand of program source code, and attribute similarity can be calculated by using the cosine of vector included angle. The longest common subsequence algorithm is used to obtain structure similarity. The similar degree between two programs can be measured with the two similarities. Experimental results show the method can effectively detect similar programs of the students’ homework.

Key words: attribute counting, structure metrics, program code similarity

中图分类号: