Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2023, Vol. 49 ›› Issue (1): 210-222. doi: 10.19678/j.issn.1000-3428.0063345

• Computer Architecture and Software Technology • Previous Articles     Next Articles

Design Pattern Recognition Based on Similarity Scoring and Secondary Subsystems

WANG Lei1,2,3, WANG Wenfa1,3, SONG Huina1,3, ZHANG Shuai1,2,3   

  1. 1. College of Mathematics and Computer Science, Yan'an University, Yan'an, Shaanxi 716000, China;
    2. Shaanxi Key Laboratory of Intelligent Processing for Energy Big Data, Yan'an, Shaanxi 716000, China;
    3. Joint Laboratory of Yan'an University and Shanghai Pactera (Big Data Application Development Direction), Yan'an, Shaanxi 716000, China
  • Received:2021-11-25 Revised:2022-05-09 Published:2022-08-08

基于相似度评分与二级子系统的设计模式识别

王雷1,2,3, 王文发1,3, 宋慧娜1,3, 张帅1,2,3   

  1. 1. 延安大学 数学与计算机科学学院, 陕西 延安 716000;
    2. 陕西省能源大数据智能处理省市共建重点实验室, 陕西 延安 716000;
    3. 延安大学 上海文思海辉联合实验室(大数据应用开发方向), 陕西 延安 716000
  • 作者简介:王雷(1988-),男,讲师、博士,主研方向为软件工程、大数据原理与应用、机器学习;王文发,教授;宋慧娜,助教;张帅(通信作者),硕士研究生。
  • 基金资助:
    国家自然科学基金(62041212);陕西省教育厅科研计划项目(21JK0988);延安大学博士科学研究启动项目(YDBK2019-51);陕西省能源大数据智能处理省市共建重点实验室开放基金(IPBED22)。

Abstract: Most existing design pattern methods directly match the pattern of the original system and the design patterns to identify pattern instances in a system.This introduces numerous false positive or false negative instances, which limits their recall and precision.Therefore, based on previous studies, this study further investigates the design pattern detection method based on similarity scoring and secondary subsystems.According to the relevant information extracted from the system, the system and design patterns are expressed in the form of directed graph/matrix.Subsequently, the system to be identified is divided into several subsystems.The subsystems are further disassembled and reorganized into secondary subsystems with the same number of classes and roles in the pattern to be identified.The similarity scoring algorithm is used to assess whether a secondary subsystem is a pattern instance, and the obtained instances are further processed obtain the final pattern instances.Experiments are performed on the JHotDraw, JRefactory, and JUnit open-source projects, where average recall rate of 96.7%, 91.7% and 100%, average precision of 94.9%, 91.5%, and 92.5%, and CPU time costs of 5 408 ms, 22 280 ms, and 3 284 ms, respectively, are obtained. The result shows that the precision and time efficiency are improved while a high recall rate is maintained.

Key words: design pattern recognition, precision, directed graph, secondary subsystem, software reverse engineering

摘要: 为寻找系统中的模式实例,现有设计模式识别方法多直接将原系统与设计模式进行匹配,从而引入大量的假阳性实例或假阴性实例,导致召回率和精确率降低。为此,在前期研究的基础上,进一步探索基于相似度评分与二级子系统的设计模式识别方法。根据从系统中提取的相关信息,将系统和设计模式表示为有向图/矩阵形式。将待识别系统划分为若干个子系统,并进一步拆解和重组为类个数与待识别模式中角色个数相等的二级子系统。利用相似度评分算法判断二级子系统是否为模式实例,同时对获取的实例做进一步处理,以得到最终的模式实例。在JHotDraw、JRefactory和JUnit三个开源项目上的实验结果表明,该方法的平均召回率分别达到96.7%、91.7%和100%,平均精确率分别达到94.9%、91.5%和92.5%,而CPU时间花费分别为5 408 ms、22 280 ms和3 284 ms,在保持高召回率的前提下提升了精确率和时间效率。

关键词: 设计模式识别, 精确率, 有向图, 二级子系统, 软件逆向工程

CLC Number: