作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (11): 173-176. doi: 10.3969/j.issn.1000-3428.2012.11.053

• 人工智能及识别技术 • 上一篇    下一篇

基于AC自动机的多模式匹配算法FACA

陈新驰,韩建民,贾 泂   

  1. (浙江师范大学计算机系,浙江 金华 321004)
  • 收稿日期:2011-08-17 出版日期:2012-06-05 发布日期:2012-06-05
  • 作者简介:陈新驰(1991-),男,本科生,主研方向:模式匹配算法;韩建民,副教授、博士;贾 泂,教授
  • 基金资助:
    国家自然科学基金资助项目(61170108, 6110019);浙江省新苗人才计划基金资助项目(2011R404018)

FACA: A Multiple Pattern Matching Algorithm Based on AC Automata

CHEN Xin-chi, HAN Jian-min, JIA Jiong   

  1. (Department of Computer, Zhejiang Normal University, Jinhua 321004, China)
  • Received:2011-08-17 Online:2012-06-05 Published:2012-06-05

摘要: Aho-Corasick自动机算法在模式匹配失配时,需要多次回溯才转移到有效的后继状态。为此,提出一种快速多模式匹配算法。该算法为每个状态建立失配时的后继指针,在模式匹配失配时,可以通过失配后继指针快速找到有效后继状态,从而避免Aho-Corasick自动机失配时的过多回溯,提高匹配效率。算法在自动机建立时采用动态规划的方法,为每个状态建立匹配长度和匹配量等信息,在模式匹配过程中,基于这些信息统计模式串在主串中的重复次数、最早出现模式串位置等信息。实验结果表明,该算法匹配精确、效率高,且支持在线操作。

关键词: 模式匹配, 自动机, 动态规划, Trie树

Abstract: Aho-Corasick automata algorithm has to backtrack for multiple times to shift to the effective subsequence state when it fails in one pattern matching. In order to solve this problem, this paper proposes a fast multiple patterns matching algorithm based on Aho-Corasick automata. The improved algorithm builds the subsequence pointers for each state. On failing matching, it can shift to the effective subsequence state through the subsequence pointers efficiently, which can reduce backtracking times in Aho-Corasick automata. Furthermore, the proposed algorithm achieves information such as matching length, matching times etc for each state during building automata by dynamic programming methods. Based on this information, the algorithm can calculate the repeated times of pattern strings, earliest position of pattern strings. Experimental results show that the algorithm has advantages of matching accuracy, efficiency, and supporting on-line operation.

Key words: pattern matching, automata, dynamic programming, Trie tree

中图分类号: