基于渐进式深度强化学习的六自由度空战决策方法

doi:10.19678/j.issn.1000-3428.0252480

摘要/Abstract

摘要： 六自由度无人机空战是一个极具挑战性的场景，包含高维连续状态和动作空间以及非线性动力学。针对上述场景，提出了一种渐进式多目标策略优化算法(Progressive Multi-objective Strategy Optimization, PMSO)，该算法通过动态调整动作空间的粒度并结合多目标奖励函数来提升策略学习效果。针对连续动作空间维度高、搜索空间过大导致的算法决策困难甚至难以学习到有效策略的问题，设计了渐进式离散化机制，该机制初始阶段采用较粗粒度的离散动作指令以快速探索策略空间，旨在利用动作指令控制效果的局部相似性来减小动作搜索空间；随着训练迭代和任务难度增加，动作指令的离散化程度逐渐缩小，从而保留了动作指令的控制精度。针对空战任务中普遍存在的稀疏奖励问题，设计了包括角度、距离和高度的多目标奖励函数，通过这些奖励的协同来引导算法更好地理解当前行为对空战任务的影响，加快收敛速度。在涵盖优势、均势、劣势的随机空战场景的仿真实验中，本文提出的PMSO算法都能快速收敛并学习到有效的空战策略，其收敛速度和学习到的策略的效果优于现有的空战算法。

Abstract: The six degrees of freedom (6-DOF) unmanned aerial vehicle (UAV) air combat scenario is highly challenging, involving high-dimensional continuous state and action space, as well as nonlinear dynamics. To address the difficulties of decision-making in such scenarios, this paper proposes a Progressive Multi-objective Strategy Optimization (PMSO) algorithm, which enhances policy learning performance by dynamically adjusting the granularity of the action space and incorporating multi-objective reward functions. To overcome the challenges caused by the high dimensionality of the continuous action space and the excessively large search space, which often result in difficulty in decision-making or the failure to learn effective strategies, this paper designs a progressive discretization mechanism. In the initial stage, coarse-grained discrete action commands are adopted to facilitate rapid exploration of the strategy space, leveraging the local similarity in the control effectiveness of action commands to reduce the search space. As training iterations progress and task difficulty increases, the degree of discretization gradually decreases, thereby preserving the precision of action control. To address the sparse reward problem prevalent in air combat tasks, multi-objective reward functions are designed, incorporating objectives such as angle, distance and altitude. The coordination of these reward functions guides the algorithm to better understand the impact of current action commands on the overall air combat task, accelerating convergence. Simulation experiments in randomized air combat scenarios, including advantageous, neutral, and disadvantaged situations, demonstrate that the proposed PMSO algorithm achieves rapid convergence and learns effective air combat strategies. The convergence speed and the performance of the learned strategies outperform existing air combat algorithms.

黎博文, 谭泰, 李杰, 张建伟, 张祥瑞. 基于渐进式深度强化学习的六自由度空战决策方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252480.

LI Bowen, TAN Tai, LI Jie, ZHANG Jianwei, ZHANG Xiangrui. A 6-DOF Air Combat Decision-Making Method Based on Progressive Deep Reinforcement Learning[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252480.

参考文献

[1] Lammers D T, Williams J M, Conner J R, et al. Airborne! UAV delivery of blood products and medical logistics for combat zones[J]. Transfusion, 2023, 63: S96-S104.
[2] Wang X, Wang Y, Su X, et al. Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial
[3] Hu J, Wang L, Hu T, et al. Autonomous maneuver decision making of dual-UAV cooperative air combat based on deep reinforcement learning[J]. Electronics, 2022, 11(3): 467.
[4] Weintraub I E, Pachter M, Garcia E. An introduction to pursuit-evasion differential games[C]// 2020 American Control Conference (ACC). IEEE, 2020: 1049-1066.
[5] Liu L, Zheng Y, Lu X, et al. Research on Individual Performance Index of Air Cluster Combat Aircraft Based on Differential Game Theory[C]//Journal of Physics: Conference Series. IOP Publishing, 2023, 2478(10): 102013.
[6] Li S, Chen M, Wang Y, et al. A fast algorithm to solve large-scale matrix games based on dimensionality reduction and its application in multiple unmanned combat air vehicles attack-defense decision-making[J]. Information Sciences, 2022, 594: 305-321.
[7] Yan J, Daobo W, Tingting B, et al. Multi-UAV objective assignment using Hungarian fusion genetic algorithm[J]. IEEE Access, 2022, 10: 43013-43021.
[8] Crumpacker J B, Robbins M J, Jenkins P R. An approximate dynamic programming approach for solving an air combat maneuvering problem[J]. Expert Systems with Applications, 2022, 203: 117448.
[9] Liles IV J M, Robbins M J, Lunday B J. Improving defensive air battle management by solving a stochastic dynamic assignment problem via approximate dynamic programming[J]. European Journal of Operational Research, 2023, 305(3): 1435-1449.
[10] Jia Y, Qu L, Li X. A double-layer coding model with a rotation-based particle swarm algorithm for unmanned combat aerial vehicle path planning[J]. Engineering Applications of Artificial Intelligence, 2022, 116: 105410.
[11] Guo C, Zhang J, Hu J, et al. Uav air combat algorithm based on bayesian probability model[C]//International Conference on Autonomous Unmanned Systems. Singapore: Springer Nature Singapore, 2022: 3176-3185.
[12] Lima Filho G M, Medeiros F L L, Passaro A. Decision support system for unmanned combat air vehicle in beyond visual range air combat based on artificial neural networks[J]. Journal of Aerospace Technology and Management, 2021, 13: e3721.
[13] Zhang H, Huang C, Xuan Y, et al. Maneuver decision of autonomous air combat of unmanned combat aerial vehicle based on deep neural network[J]. Acta Armamentarii, 2020, 41(8): 1613.
[14] Pope A P, Ide J S, Mićović D, et al. Hierarchical reinforcement learning for air-to-air combat[C]//2021 international conference on unmanned aircraft systems (ICUAS). IEEE, 2021: 275-284.
[15] Yang Q, Zhang J, Shi G, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access, 2019, 8:363-378.、
[16] Gong Z, Xu Y, Luo D. UAV cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning[J]. Unmanned Systems, 2023, 11(03): 273-286.
[17] Li Y, Han W, Wang Y. Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system[J]. IEEE Access, 2020, 8: 67887-67898.
[18] 李永丰,史静平,章卫国,等.深度强化学习的无人作战飞机空战机动决策[J].哈尔滨工业大学学报,2021,53(12):33-41. Li Yongfeng, Shi Jingping, Zhang Weiguo, et al. Maneuver decision of UCAV in air combat based on deep reinforcement learning [J]. Journal of Harbin Institute of Technology, 2021. 53 (12)：33-41.
[19] Chai J, Chen W, Zhu Y, et al. A hierarchical deep reinforcement learning framework for 6-DOF UCAV air-to-air combat[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(9): 5417-5429.
[20] Zhang S, Du X, Xiao J, et al. Reinforcement learning control for 6 DOF flight of fixed-wing aircraft[C]//2021 33rd Chinese Control and Decision Conference (CCDC). IEEE, 2021: 5454-5460.
[21] Razzaghi P, Tabrizian A, Guo W, et al. A survey on reinforcement learning in aviation applications[J]. Engineering Applications of Artificial Intelligence, 2024, 136: 108911.
[22] 章胜, 杜昕, 肖娟等. 基于深度强化学习的固定翼飞行器六自由度飞行智能控制[J]. 指挥与控制学报, 2022,8(2): 179-188. Zhang Sheng, Du Xin, Xiao Juan, et al. Fixed-Wing Aircraft 6-DOF Flight Control Based on Deep Reinforcement Learning [J]. Journal of Command and Control, 2022, 8(2): 179-188.
[23] Heidlauf P, Collins A, Bolender M, et al. Verification Challenges in F-16 Ground Collision Avoidance and Other Automated Maneuvers[C]//ARCH@ ADHS. 2018: 208-217.
[24] Zhang B, Sun X, Liu S, et al. Event-triggered adaptive fault-tolerant synchronization tracking control for multiple 6-DOF fixed-wing UAVs[J]. IEEE Transactions on Vehicular Technology, 2021, 71(1): 148-161.
[25] Tang C, Lai Y C. Deep reinforcement learning automatic landing control of fixed-wing aircraft using deep deterministic policy gradient[C]//2020 international conference on unmanned aircraft systems (ICUAS). IEEE, 2020: 1-9.
[26] Wang Z, Li H, Wu Z, et al. A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space[J]. International Journal of Advanced Robotic Systems, 2021, 18(1): 1729881421989546.
[27] Hu Y, Wang W, Jia H, et al. Learning to utilize shaping rewards: A new approach of reward shaping[J]. Advances in Neural Information Processing algorithms[J]. arXiv preprint arXiv:1707.06347, 2017. Wang Xuan, Wang Weijia, Song Kepu, Wang Minwen
[29] 王炫,王维嘉,宋科璞,等.基于进化式专家系统树的无人机空战决策技术[J].兵工自动化,2019,38(01):42-47. Wang Xuan, Wang Weijia, Song Kepu, et al. UAV Air Combat Decision Based on Evolutionary Expert System Tree [J]. Ordnance Industry Automation, 2019,38(01):42-47.
[30] Susanto T, Setiawan M B, Jayadi A, et al. Application of Unmanned Aircraft PID Control System for Roll, Pitch and Yaw Stability on Fixed Wings[C]//2021 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE). IEEE, 2021: 186-190.

选择文件类型/文献管理软件名称

选择包含的内容