摘要: BMH2C算法综合BMH和BMHS算法,利用当前窗口字符t[k]及其下一字符t[k+1]组成的双字符串来决定模式串右移量,具有比BM算法、BMH算法、BMHS算法更优的性能。但对于双字符串在模式串中出现一次及以上的情况,BMH2C算法中的模式串右移量仍有待进一步增大,从而减少当前窗口右移次数,提高BMH2C算法的匹配效率。为此,在BMH2C算法的基础上提出一种改进算法,该算法考虑双字符串t[k]t[k+1]在模式串中出现的次数,以及该双字符串在模式串中对应位置的后继字符与字符t[k+2]的相等关系。改进算法利用2个右移数组和1个模式串预处理数组,在匹配过程中通过判断字符t[k+2]与模式串预处理数组中相应字符是否相等,从而选择2个右移数组之一的对应值作为当前窗口的右移量。实验结果显示,在相同条件下,对于当前窗口移动次数和匹配所耗时间,BMH2C改进算法比BMH2C算法分别平均减少11.33%和9.40%,有效提高了匹配效率。
关键词:
模式匹配,
BMH2C算法,
字符串,
右移,
预处理
Abstract: BMH2C algorithm combines BMH and BMHS algorithm. In BMH2C algorithm, the right shift distance of pattern string is determined by the double string, which is made of the character t[k] of the current window and the next character t[k+1]. BMH2C algorithm has better performance than BM, BMH and BMHS algorithm. Nevertheless, the right shift distance of pattern string in the BMH2C algorithm remains to be further increased, when the double string appears for one time or more than one time. Do like that, the number of window’s shift will be greatly reduced and the matching speed improved effectively. Therefore, an improved algorithm based on BMH2C algorithm is proposed. It takes the number of appearance in the pattern string of the double string t[k]t[k+1] into account. And the equality relationship of character t[k+2] and the character which is followed the appropriate position of t[k]t[k+1] in the pattern string is considered. The improved algorithm uses two right shift arrays and a pretreated array of the pattern strings. During the matching, the corresponding value of one of the two right shift arrays is selected as the right shift distance of current window, by judging the equality relationship of character t[k+2] and the corresponding character in the pretreated array. Experimental results show that the improved BMH2C algorithm is respectively 11.33% and 9.40% less than BMH2C algorithm on average in the same conditions, for the matching time and the number of window’s shift. With the algorithm, the matching speed is improved effectively
Key words:
pattern matching,
BMH2C algorithm,
character string,
right shift,
pretreatment
中图分类号: