作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

基于图模型的克隆代码演化痕迹构建及模式识别

葛广帅,刘东升,张丽萍,侯敏   

  1. (内蒙古师范大学 计算机与信息工程学院,呼和浩特 010022)
  • 收稿日期:2016-08-11 出版日期:2017-05-15 发布日期:2017-05-15
  • 作者简介:葛广帅(1988—),男,硕士研究生,主研方向为软件系统分析;刘东升(通信作者)、张丽萍,教授;侯敏,讲师、硕士。
  • 基金资助:
    国家自然科学基金(61462071,61363017);内蒙古自然科学基金(2014MS0613,2015MS0606,2016MS0612);内蒙古自治区高等学校科学研究计划项目(NJZY16045)。

Clone Code Evolution Trace Building and Pattern Recognition Based on Graph Model

GE Guangshuai,LIU Dongsheng,ZHANG Liping,HOU Min   

  1. (College of Computer and Information Engineering,Inner Mongolia Normal University,Hohhot 010022,China)
  • Received:2016-08-11 Online:2017-05-15 Published:2017-05-15

摘要: 针对克隆跟踪不精确、演化模式识别繁琐以及克隆群合并现象处理困难等问题,提出一种改进的克隆代码演化痕迹构建及模式识别方法。在相邻版本使用主题概率模型实现克隆群初步映射,计算代码位置重叠率及文本相似度完成克隆片段映射,修复克隆群映射得到精确的相邻版本克隆映射结果。依据相邻版本间建立映射的克隆群数量关系及交叉程度识别短期演化模式,构建图模型,将克隆群作为点、映射关系作为边,并根据产生形式为克隆群标注短期演化模式。使用广度优先搜索算法提取克隆家系,按照克隆家系中包含的克隆群种类及是否有环识别长期演化模式。对5款开源软件的70个版本进行实验,结果表明,运用该方法约95%的克隆在演化中保持稳定,约1%的克隆经历了合并复合,并且80%左右克隆代码的生命周期未超过发布版本总数的一半。

关键词: 图模型, 克隆跟踪, 演化模式, 克隆家系, 克隆代码

Abstract: Aiming at the problems that clone code tracking is inaccurate and evolution pattern recognition is tedious,especially clone class merging is difficult,this paper proposes a method of clone code evolution traces building and pattern recognition based on graph model.Clone class mapping is implemented preliminarily between adjacent versions using theme probability model,location overlap rate of clone code and cloned fragment text similarity are calculated to map clone fragments,clone class mapping is repaired to get accurate mapping results between adjacent versions.Short-term evolution pattern is recognized and figure model is established on the basis of relationship of clone class mapping quantity and the degree of cross.Considering clone class as points,mapping relationship as edges,and according to the form of production,short-term evolution model of clone class is indicated.The clone genealogy is extracted using breadth-first search algorithm,and the long-term evolution model is identified according to types of clone class included in clone genealogy and whether there is loop among them.Experiment is conducted on 70 versions of 5 open source software.Results show that about 95% of the clone code is stable in evolution,about 1% of clone code undergoes merging or recombination,and about 80% of clone code′s life cycle does not exceed half of the total quantity of released version.

Key words: graph model, clone tracking, evolution pattern, clone genealogy, clone code

中图分类号: