作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (1): 222-228,242. doi: 10.19678/j.issn.1000-3428.0053886

• 体系结构与软件技术 • 上一篇    下一篇

面向开源代码复用的程序比对分析方法

许福, 郝亮, 陈飞翔, 李冬梅, 崔晓晖   

  1. 北京林业大学 信息学院, 北京 100083
  • 收稿日期:2019-02-07 修回日期:2019-03-16 出版日期:2020-01-15 发布日期:2019-03-25
  • 作者简介:许福(1979-),男,副教授、博士,主研方向为程序分析、软件工程、林业遥感;郝亮,硕士;陈飞翔(通信作者),教授、博士;李冬梅,副教授、博士;崔晓晖,讲师、博士。
  • 基金资助:
    国家自然科学基金(61772078);北京市重点研发计划(D171100001817003)。

Program Comparison Analysis Method for Open Source Code Reuse

XU Fu, HAO Liang, CHEN Feixiang, LI Dongmei, CUI Xiaohui   

  1. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
  • Received:2019-02-07 Revised:2019-03-16 Online:2020-01-15 Published:2019-03-25

摘要: 开源代码复用是重要的软件开发模式,但开源许可证侵权与代码同步更新是当前开源代码复用中的2个主要问题。利用代码快照间的高度相似性特点,设计一种代码仓库的高效增量分析方法,在此基础上,利用Simhash算法将函数代码映射成函数指纹,提出以函数为基本分析单元的工程相似度计算方法,从而降低分析结果的存储空间并提高代码比对速度。设计3组实验分别从代码分析效率、工程相似度判定和函数更新检测方面进行评估,结果表明,该方法能满足开源代码复用中相似度检测和代码溯源的需求,且能够有效缩短总体分析时间。

关键词: 开源软件, 代码复用, 增量分析, 程序比对, 代码溯源

Abstract: As an important software development mode,open source code reuse suffers from two major problems:open source license infringement and synchronous update of code.So this paper designs an efficient incremental analysis method of code warehouse by utilizing the high similarities between code snapshots.On this basis,the Simhash algorithm is used to map the function code into the function fingerprint.Then,the engineering similarity calculation method that takes function as the basic analysis unit is proposed,so as to reduce the storage space of analysis results and improve the speed of code comparison.Three groups of experiments are designed to evaluate the effectiveness of the proposed method from the aspects of code analysis efficiency,engineering similarity determination and function update detection respectively.Results show that the proposed method can meet the needs of similarity detection and code traceability in open source code reuse,as well as effectively reducing the overall analysis time.

Key words: open source software, code reuse, incremental analysis, program comparison, code traceability

中图分类号: