摘要: 针对经典编辑距离算法在求解字符串相似度时计算效率过低的问题,提出一种改进的编辑距离算法。先求得2个字符串的最长公共前缀和最长公共后缀,再根据经典编辑距离算法得到2个字符串剩余部分之间的编辑距离,由反证法证明该编辑距离即为2个原始字符串的编辑距离。在此基础上,分析改进算法的优势并将其应用于网页篡改检测中。实验结果表明,与经典算法相比,改进算法在求解同一网址的网页相似度时具有更高的计算效率。
关键词:
编辑距离,
相似度,
公共前缀,
公共后缀,
局部变化性,
篡改检测
Abstract: For the low computational efficiency in solving the similarity of two strings by traditional algorithm,an improved edit distance algorithm is proposed.It firstly obtains the longest common prefix and the longest common suffix of the two strings,and then gets the edit distance between the remainder of the two strings by traditional algorithm.Proof by contradiction is used to prove that this edit distance equals to the solution by traditional algorithm.On this basis,the improved algorithm is researched about the advantages and be applied to the Web tamper detection.Experimental results show that compared with the traditional algorithm,the improved edit distance algorithm has better computational efficiency in obtaining the similarity between the pages in the same URL.
Key words:
edit distance,
similarity,
common prefix,
common suffix,
local variability,
tamper detection
中图分类号:
王卫红,李君. 基于局部变化性的改进编辑距离算法[J]. 计算机工程.
WANG Weihong,LI Jun. Improved Edit Distance Algorithm Based on Local Variability[J]. Computer Engineering.