基于POMDP与对抗元强化学习的自适应动态防御模型

doi:10.19678/j.issn.1000-3428.0260033

摘要/Abstract

摘要： 针对动态防御模型自适应能力有限、对抗鲁棒性不足及防御成本考量欠缺等问题，提出一种融合元学习与对抗训练的异步优势行动者-评论家自适应动态防御模型。该模型将防御过程形式化为部分可观测马尔可夫决策过程，设计融合误报/漏报惩罚与操作代价的奖励函数，构建三层协同优化框架：内层基于异步优势行动者-评论家算法实现高效策略搜索；中层引入投影梯度下降对抗训练，通过极小-极大博弈增强对抗扰动下的鲁棒性；外层采用模型无关元学习构建元优化器，使模型能基于少量样本快速适应新攻击。在NSL-KDD、UNSW-NB15及CICIDS2017数据集上的实验表明，该模型最佳防御决策率均超过92%，平均防御资源消耗降低约60%；在高强度扰动下攻击成功率仍低于38.2%，未出现性能崩塌；针对零日攻击的检测准确率可提升至88%以上。研究为构建高适应性、强鲁棒性、高效益的智能动态防御系统提供了可行路径。

Abstract: Addressing issues such as limited adaptive capacity, insufficient adversarial robustness, and inadequate consideration of defense costs in dynamic defense models, an asynchronous advantage actor-critic adaptive dynamic defense model that integrates meta-learning and adversarial training is proposed. This model formalizes the defense process as a partially observable Markov decision process (POMDP), designs a reward function that incorporates penalties for false positives/negatives and operational costs, and constructs a three-layer collaborative optimization framework: the inner layer implements efficient strategy search based on the asynchronous advantage actor-critic algorithm; the middle layer introduces projection gradient descent adversarial training to enhance robustness under adversarial perturbations through a minimax game; the outer layer employs model-agnostic meta-learning to construct a meta-optimizer, enabling the model to quickly adapt to new attacks based on a small number of samples. Experiments on the NSL-KDD, UNSW-NB15, and CICIDS2017 datasets show that the model achieves an optimal defense decision rate (ODR) exceeding 92%, with an average reduction in defense resource consumption of approximately 60%. Under high-intensity perturbations, the attack success rate (ASR) remains below 38.2%, with no performance collapse; the detection accuracy for zero-day attacks can be improved to over 88%. This research provides a feasible path for constructing an intelligent dynamic defense system with high adaptability, strong robustness, and high efficiency.

张鹏, 赵国生, 伍小胜. 基于POMDP与对抗元强化学习的自适应动态防御模型[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260033.

Zhang Peng, Zhao Guosheng , Wu Xiaosheng. Adaptive dynamic defense model based on POMDP and adversarial meta-reinforcement learning[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260033.

参考文献

[1] Abreu D, Moura D, Rothenberg C E, et al. QuantumNetSec: Quantum Machine Learning for Network Security[J]. International Journal of Network Management, 2025, 35(4): 20.
[2] Herrera J A, Camargo J E. A Survey on Machine Learning Applications for Software Defined Network Security[C]// 17th International Conference on Applied Cryptography and Network Security (ACNS), 2019: 70-93.
[3] Hong W M. The Technology Research of Dynamic Network Active Defense in Network Management[C]// International Workshop on Information and Electronics Engineering (IWIEE) / International Conference on Information, Computing and Telecommunications (ICICT), 2012: 1584-1589.
[4] Jiang Y C, Xia Z Y, Zhang S Y. A novel defense model for dynamic topology network based on mobile agent[J]. Microprocessors and Microsystems, 2005, 29(6): 289-297.
[5] Owusu E, Mapkar M, Rahouti M, et al. Robust Intrusion Detection With Combinatorial Fusion and Generative Artificial Intelligence[J]. Computer, 2025, 58(4): 46-57.
[6] Samed A L, Sagiroglu S. Explainable artificial intelligence models in intrusion detection systems[J]. Engineering Applications of Artificial Intelligence, 2025, 144: 32.
[7] Shankar S S, Hung B T, Chakrabarti P, et al. A novel optimization based deep learning with artificial intelligence approach to detect intrusion attack in network system[J]. Education and Information Technologies, 2024, 29(4): 3859-3883.
[8] 邬江兴. 网络空间拟态防御研究[J]. 信息安全学报, 2016, 1(04): 1-10. Wu Jiangxing. Research on Mimetic Defense in Cyberspace [J]. Journal of Information Security, 2016, 1(04): 1-10.
[9] Wei J, Zhang R, Liu J, et al. Defense Strategy of Network Security based on Dynamic Classification[J]. Ksii Transactions on Internet & Information Systems, 2015, 9(12): 5116-5134.
[10] Zhu M, Hu Z, Liu P. Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed[C]// Proceedings of the First ACM Workshop on Moving Target Defense, 2014: 51–58.
[11] Prakash A, Wellman M P. Empirical Game-Theoretic Analysis for Moving Target Defense[C]// Proceedings of the Second ACM Workshop on Moving Target Defense, 2015: 57–65.
[12] Kandhro I A, Panhwar A O, Awan S A, et al. Network security attack classification: leveraging machine learning methods for enhanced detection and defence[J]. International Journal of Electronic Security and Digital Forensics, 2025, 17(1-2): 12.
[13] 刘奇旭, 王君楠, 尹捷, 等. 对抗机器学习在网络入侵检测领域的应用[J]. 通信学报, 2021, 042(011): 1-12. Liu Qixu, Wang Junnan, Yin Jie, et al. Application of Adversarial Machine Learning in Network Intrusion Detection [J]. Journal of Communications, 2021, 042(011): 1-12.
[14] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]// Proceedings of the 34th International Conference on Machine Learning, 2017: 1126–1135.
[15] Alrayes F S, Amin S U, Hakami N. An Adaptive Framework for Intrusion Detection in IoT Security Using MAML (Model-Agnostic Meta-Learning)[J]. Sensors, 2025, 25(8): 38.
[16] Lu C M, Wang X F, Yang A M, et al. A Few-Shot-Based Model-Agnostic Meta-Learning for Intrusion Detection in Security of Internet of Things[J]. Ieee Internet of Things Journal, 2023, 10(24): 21309-21321.
[17] 顾泽宇, 张兴明, 魏帅. 基于增强学习的自适应动态防御机制[J]. 小型微型计算机系统, 2019(2): 6. Gu Zeyu, Zhang Xingming, Wei Shuai. Adaptive Dynamic Defense Mechanism Based on Reinforcement Learning [J]. Journal of Small and Microcomputer Systems, 2019(2): 6.
[18] Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]// Proceedings of the 33rd International Conference on International Conference on Machine Learning 2016: 1928–1937.
[19] Muhati E, Rawat D B, Soc I C. Asynchronous Advantage Actor-Critic (A3C) Learning for Cognitive Network Security[C]// 3rd EEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), 2021: 106-113.
[20] Zhu W Y, Liu X L, Liu Y M, et al. RT-A3C: Real-time Asynchronous Advantage Actor-Critic for optimally defending malicious attacks in edge-enabled Industrial Internet of Things[J]. Journal of Information Security and Applications, 2025, 91: 19.
[21] 胡浩, 赵昌军, 刘璟, 等. 基于随机博弈与A3C深度强化学习的网络防御策略优选[J]. 指挥与控制学报, 2024, 10(1): 47-58. Hu Hao, Zhao Changjun, Liu Jing, et al. Optimal selection of network defense strategies based on stochastic game theory and A3C deep reinforcement learning [J]. Journal of Command and Control, 2024, 10(1): 47-58.
[22] Sauka K, Shin G-Y, Kim D-W, et al. Adversarial Robust and Explainable Network Intrusion Detection Systems Based on Deep Learning[J]. Applied Sciences, 2022, 12(13): 6451.
[23] Roshan K, Zafar A, Ul Haque S B. Untargeted white-box adversarial attack with heuristic defence methods in real-time deep learning based network intrusion detection system[J]. Computer Communications, 2024, 218: 97-113.
[24] Madry A, Makelov A, Schmidt L, et al. Towards Deep Learning Models Resistant to Adversarial Attacks[J]. ArXiv, 2017, abs/1706.06083.
[25] A L P K, B M L L, C A R C. Planning and acting in partially observable stochastic domains[J]. Artificial Intelligence, 1998, 101(1–2): 99-134.
[26] Hausknecht M J, Stone P. Deep Recurrent Q-Learning for Partially Observable MDPs[J]. ArXiv, 2015, abs/1507.06527.
[27] Javaid A, Niyaz Q, Sun W, et al. A Deep Learning Approach for Network Intrusion Detection System[C]// Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), 2016: 21–26.
[28] Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)[C]// 2015 Military Communications and Information Systems Conference (MilCIS), 2015: 1-6.
[29] Rosay A, Cheval E, Carlier F, et al. Network Intrusion Detection: A Comprehensive Analysis of CIC-IDS2017[C]// 8th International Conference on Information Systems Security and Privacy (ICISSP), 2022: 25-36.

选择文件类型/文献管理软件名称

选择包含的内容