计算机工程

• 开发研究与工程应用 • 上一篇    下一篇

基于局部变化性的改进编辑距离算法

王卫红,李君   

  1. (浙江工业大学计算机科学与技术学院,杭州 310023)
  • 收稿日期:2015-01-30 出版日期:2015-07-15 发布日期:2015-07-15
  • 作者简介:王卫红(1969-),男,教授,主研方向:空间信息服务,网络信息安全;李君,硕士研究生。
  • 基金项目:
    国家自然科学基金资助项目(61340058);浙江省自然科学基金资助项目(LZ14F020001)。

Improved Edit Distance Algorithm Based on Local Variability

WANG Weihong,LI Jun   

  1. (College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
  • Received:2015-01-30 Online:2015-07-15 Published:2015-07-15

摘要: 针对经典编辑距离算法在求解字符串相似度时计算效率过低的问题,提出一种改进的编辑距离算法。先求得2个字符串的最长公共前缀和最长公共后缀,再根据经典编辑距离算法得到2个字符串剩余部分之间的编辑距离,由反证法证明该编辑距离即为2个原始字符串的编辑距离。在此基础上,分析改进算法的优势并将其应用于网页篡改检测中。实验结果表明,与经典算法相比,改进算法在求解同一网址的网页相似度时具有更高的计算效率。

关键词: 编辑距离, 相似度, 公共前缀, 公共后缀, 局部变化性, 篡改检测

Abstract: For the low computational efficiency in solving the similarity of two strings by traditional algorithm,an improved edit distance algorithm is proposed.It firstly obtains the longest common prefix and the longest common suffix of the two strings,and then gets the edit distance between the remainder of the two strings by traditional algorithm.Proof by contradiction is used to prove that this edit distance equals to the solution by traditional algorithm.On this basis,the improved algorithm is researched about the advantages and be applied to the Web tamper detection.Experimental results show that compared with the traditional algorithm,the improved edit distance algorithm has better computational efficiency in obtaining the similarity between the pages in the same URL.

Key words: edit distance, similarity, common prefix, common suffix, local variability, tamper detection

中图分类号: