Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Pre-trained Language Model-Based Web Application Firewalls Enhancement Method

  

  • Published:2025-06-19

基于预训练语言模型的Web应用防火墙加固方法

Abstract: Web Application Firewall (WAF) is an effective tool for protecting web applications from cyberattacks. The rapid development of web applications in recent years has made research on WAF increasingly significant. Common approaches to building WAF include rule-based methods and machine learning-based methods. Rule-based WAF detect attacks using a predefined set of rules, which are often complex and challenging to update dynamically or manually. Machine learning-based WAF, primarily utilizing methods such as Support Vector Machines, classify payloads but struggle to identify sudden malicious payloads as effectively as rule-based methods and lack the breadth of coverage provided by rule-based approaches. To address these limitations, this paper proposes a WAF enhancement method based on pretrained language models, which strengthens rule-based WAF. The method first fine-tunes a pretrained language model using collected malicious and benign payloads to endow it with preliminary discriminative capabilities. Subsequently, the model undergoes iterative fine-tuning using malicious payloads intercepted by the WAF to learn the textual features of these payloads. During deployment, the pretrained language model is positioned in front of the WAF to perform initial payload screening. Additionally, returning deceptive responses to some requests intercepted by the pretrained language model further enhances the effectiveness of the proposed method. Adversarial experiments were conducted on two open-source WAF, targeting SQL injection and cross-site scripting attacks with two attack methods. The results demonstrate that the average interception rates for payloads generated by the two attack methods increased from 40.01% and 36.07% to 96.91% and 97.13%, respectively, after enhancement with the pretrained language model, while maintaining a false positive rate of 0. These findings validate the effectiveness of the proposed method.

摘要: Web应用防火墙(WAF)是保护网络应用免受网络攻击的一种有效手段,近年来网络应用的快速发展使得WAF的相关研究更加具有现实意义。常见的WAF构建方法主要有基于规则的方法和基于机器学习的方法。基于规则的WAF通过预定义的一组规则来检测攻击,这些规则通常过于复杂,难以动态和手动更新。基于机器学习的WAF主要通过支持向量机等方法来对负载进行判别,但是很难做到像基于规则方法那样对突发的恶意负载进行识别,并且很难达到基于规则方法的广度。对此本文提出一种基于预训练语言模型的WAF加固方法,对基于规则的WAF使用预训练语言模型对其进行加固。方法首先通过对预训练语言模型收集的恶意、良性的负载进行微调使其具有初步的判别能力,之后迭代使用被WAF拦截的恶意负载多次进行微调,学习被WAF拦截负载的文本特征。在部署时,预训练语言模型被部署在WAF的前面对来访负载进行初步判别,此外,对部分被预训练语言模型拦截请求返回虚假回应使得本文所提方法的效果得到进一步加强。在2款开源WAF上针对SQL注入、跨站脚本攻击2种Web攻击类型与2种攻击方法进行对抗实验,结果显示,经过预训练语言模型加固的WAF对2种攻击方法生成负载的平均拦截率分别从40.01%、36.07%提升到了96.91%、97.13%,并且误报率维持为0,验证了所提方法的有效性。