作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2018, Vol. 44 ›› Issue (11): 197-201,208. doi: 10.19678/j.issn.1000-3428.0047566

• 人工智能及识别技术 • 上一篇    下一篇

一种基于结构特征的树相似度计算方法

陈彦桦,李剑   

  1. 北京邮电大学 计算机学院,北京 100876
  • 收稿日期:2017-06-12 出版日期:2018-11-15 发布日期:2018-11-15
  • 作者简介:陈彦桦(1992—),男,硕士研究生,主研方向为机器学习、数据结构;李剑,副教授、博士生导师
  • 基金资助:

    国家自然科学基金(U1636106,61472048)

A Tree Similarity Computation Method Based on Structrue Feature

CHEN Yanhua,LI Jian   

  1. School of Computer Science and Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Received:2017-06-12 Online:2018-11-15 Published:2018-11-15

摘要:

为高效计算树的相似度,提出基于树结构特征的相似度计算方法。通过构造K个节点的所有非同构形态子树,计算其同构个数并作为特征向量进行树的相似度计算。该方法摒弃了直接计算相似度的方式,利用树的结构特征间接表示树的相似度,可有效应用于大规模数据集的相似度计算。实验结果显示:在特征向量提取方面,随着树的节点规模增大,算法时间复杂度呈线性增加;在相似度计算方面,同类数据相似度0.7以上占比74%,不同类数据相似度不超过0.2,表明提取的特征向量能够较好地表征原程序

关键词: 子树同构, 结构特征, 相似度计算, 编辑距离, 聚类

Abstract:

In order to efficiently compute the similarity of tree,a method of computing the tree similarity based on structure feature is proposed.Firstly,all non-isomorphic sub-trees of K nodes are constructed,then the number of the isomorphic sub-tree from tree is calculated by using sub-trees constructed before.These numbers will be used to compute the similarity of tree as the feature vector.In this paper,the method of directly calculating similarity is abandoned,and the similarity of trees is indirectly represented by the structure features of trees.It can be applied to the similarity computation of large-scale datasets.Experimental results show that,in feature vectors extracting,with the size of the tree nodes increases,the time complexity of the proposed algorithm increases linearly;in similarity computing,the similarity of the same category higher than 0.7 are accounted for 74%,while the different categories do not exceed 0.2,which indicates that the extracted feature vectors are able to characterize the original application.

Key words: sub-tree isomorphism, structure feature, similarity computation, edit distance, clustering

中图分类号: