作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

基于后缀数组的克隆检测

史庆庆,张丽萍,尹丽丽,刘东升   

  1. (内蒙古师范大学计算机与信息工程学院,呼和浩特 010022)
  • 收稿日期:2012-09-14 出版日期:2013-09-15 发布日期:2013-09-13
  • 作者简介:史庆庆(1987-),男,硕士研究生,主研方向:软件工程,软件测试;张丽萍,副教授、硕士;尹丽丽,硕士研究生; 刘东升,教授
  • 基金资助:
    内蒙古自然科学基金资助项目(2011MS0906)

Clone Detection Based on Suffix Array

SHI Qing-qing, ZHANG Li-ping, YIN Li-li, LIU Dong-sheng   

  1. (Computer & Information Engineering College, Inner Mongolia Normal University, Hohhot 010022, China)
  • Received:2012-09-14 Online:2013-09-15 Published:2013-09-13

摘要: 程序员对源代码的拷贝、粘贴及修改活动会导致软件中出现大量克隆代码,增加软件开发和维护的成本。为解决该问题,提出一种新的克隆检测方法。利用基于后缀数组的算法查找重复的Token子串,进而检测出克隆代码,开发相应的克隆检测工具SaCD,用其检测29款C语言开源软件。实验结果表明,SaCD能快速有效地检测软件中的Type-1和Type-2语句克隆,其检测速度比传统的克隆检测工具CCFinderx快了近20倍。

关键词: 克隆代码, 克隆检测, Token串, 后缀数组, 重复子串, DC3算法

Abstract: The activities of the programmers for copying, pasting and modifying result in a lot of code clones in the software systems, and increase the cost of software development and maintenance. Aiming at this problem, this paper presents a new clone detection method which uses algorithm based on suffix array to efficiently search repeated token substrings and to detect code clones at last. It develops a clone detection tool SaCD and detects twenty-nine open source software systems of C language. Experimental results show that the SaCD can fastly and efficiently detect Type-1 and Type-2 statement clones in the software. The detection speed of SaCD is twenty times than that of CCFinderx.

Key words: clone code, clone detection, Token string, suffix array, repeated substring, DC3 algorithm

中图分类号: