作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于POMDP与对抗元强化学习的自适应动态防御模型

  • 发布日期:2026-04-14

Adaptive dynamic defense model based on POMDP and adversarial meta-reinforcement learning

  • Published:2026-04-14

摘要: 针对动态防御模型自适应能力有限、对抗鲁棒性不足及防御成本考量欠缺等问题,提出一种融合元学习与对抗训练的异步优势行动者-评论家自适应动态防御模型。该模型将防御过程形式化为部分可观测马尔可夫决策过程,设计融合误报/漏报惩罚与操作代价的奖励函数,构建三层协同优化框架:内层基于异步优势行动者-评论家算法实现高效策略搜索;中层引入投影梯度下降对抗训练,通过极小-极大博弈增强对抗扰动下的鲁棒性;外层采用模型无关元学习构建元优化器,使模型能基于少量样本快速适应新攻击。在NSL-KDD、UNSW-NB15及CICIDS2017数据集上的实验表明,该模型最佳防御决策率均超过92%,平均防御资源消耗降低约60%;在高强度扰动下攻击成功率仍低于38.2%,未出现性能崩塌;针对零日攻击的检测准确率可提升至88%以上。研究为构建高适应性、强鲁棒性、高效益的智能动态防御系统提供了可行路径。

Abstract: Addressing issues such as limited adaptive capacity, insufficient adversarial robustness, and inadequate consideration of defense costs in dynamic defense models, an asynchronous advantage actor-critic adaptive dynamic defense model that integrates meta-learning and adversarial training is proposed. This model formalizes the defense process as a partially observable Markov decision process (POMDP), designs a reward function that incorporates penalties for false positives/negatives and operational costs, and constructs a three-layer collaborative optimization framework: the inner layer implements efficient strategy search based on the asynchronous advantage actor-critic algorithm; the middle layer introduces projection gradient descent adversarial training to enhance robustness under adversarial perturbations through a minimax game; the outer layer employs model-agnostic meta-learning to construct a meta-optimizer, enabling the model to quickly adapt to new attacks based on a small number of samples. Experiments on the NSL-KDD, UNSW-NB15, and CICIDS2017 datasets show that the model achieves an optimal defense decision rate (ODR) exceeding 92%, with an average reduction in defense resource consumption of approximately 60%. Under high-intensity perturbations, the attack success rate (ASR) remains below 38.2%, with no performance collapse; the detection accuracy for zero-day attacks can be improved to over 88%. This research provides a feasible path for constructing an intelligent dynamic defense system with high adaptability, strong robustness, and high efficiency.