计算机工程 ›› 2019, Vol. 45 ›› Issue (4): 148-156.doi: 10.19678/j.issn.1000-3428.0049992

• 人工智能及识别技术 • 上一篇    下一篇

一种针对DFA状态爆炸的正则表达式匹配方法

王翔1,2,3,卢毓海2,3,马伟2,3,刘燕兵2,3   

  1. 1.中国科学院大学 网络空间安全学院,北京 100049; 2.中国科学院信息工程研究所,北京 100093; 3.信息内容安全技术国家工程实验室,北京 100093
  • 收稿日期:2018-01-08 出版日期:2019-04-15 发布日期:2019-04-15
  • 作者简介:王翔(1993—),男,硕士研究生,主研方向为模式匹配、内容过滤;卢毓海,博士研究生;马伟、刘燕兵,副研究员。
  • 基金项目:

    国家重点研发计划(2016YFB0800303);中国科学院信息工程研究所基础前沿项目(Y7Z0351101)。

A Regular Expression Matching Method for DFA State Explosion

WANG Xiang1,2,3,LU Yuhai2,3,MA Wei2,3,LIU Yanbing2,3   

  1. 1.School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China; 2.Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China; 3.National Engineering Laboratory for Information Security Technologies,Beijing 100093,China
  • Received:2018-01-08 Online:2019-04-15 Published:2019-04-15

摘要:

针对基于确定有限状态自动机的匹配引擎在大规模、复杂规则下会出现状态爆炸的问题,提出正则表达式子串抽取算法。通过将子串抽取算法应用于DFA状态爆炸场景,设计基于子串抽取的正则匹配引擎。实验结果表明,该算法在单个规则上运行时间可达10 ms量级,抽取率高达99%,同时匹配引擎具有较好的稳定性和可拓展性,且匹配速度优于相关开源匹配引擎。

关键词: 正则表达式, 确定有限自动机, 状态爆炸, 子串抽取, 匹配引擎

Abstract:

Aiming at the problem that the Deterministic Finite Automation(DFA)-based matching engine will explode under large-scale and complex rules,a regular expression substring extraction algorithm is proposed.A substring extraction based regular matching engine is designed by applying the substring extraction algorithm to the DFA state explosion scenario.Experimental results show that the algorithm can run on a single rule up to 10 ms,and the decimation rate is as high as 99%.At the same time,the matching engine has better stability and scalability,and the matching speed is better than the related open source matching engine.

Key words: regular expression, Deterministic Finite Automaton(DFA), state explosion, substring extraction, matching engine

中图分类号: