作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (5): 73-82. doi: 10.19678/j.issn.1000-3428.0069850

• 人工智能与模式识别 • 上一篇    下一篇

基于融合课程思想MADDPG的无人机编队控制

吴凯峰, 刘磊, 刘晨, 梁成庆   

  1. 河海大学数学学院, 江苏 南京 211100
  • 收稿日期:2024-05-16 修回日期:2024-09-09 出版日期:2025-05-15 发布日期:2024-10-10
  • 通讯作者: 刘磊,E-mail:liulei_hust@163.com E-mail:liulei_hust@163.com
  • 基金资助:
    河北省自然科学基金面上项目(A2023209002)。

Unmanned Aerial Vehicle Formation Control Based on MADDPG with Integrated Curriculum Learning

WU Kaifeng, LIU Lei, LIU Chen, LIANG Chengqing   

  1. School of Mathematics, Hohai University, Nanjing 211100, Jiangsu, China
  • Received:2024-05-16 Revised:2024-09-09 Online:2025-05-15 Published:2024-10-10

摘要: 多智能体深度确定性梯度(MADDPG)算法由深度确定性策略梯度(DDPG)算法扩展而来,专门针对多智能体环境设计,算法中每个智能体不仅考虑自身的观察和行动,还考虑其他智能体的策略,以更好地进行集体决策,这种设计显著提升了其在复杂、多变的环境中的性能和稳定性。基于MADDPG算法框架,设计算法的网络结构、状态空间、动作空间和奖励函数,实现无人机编队控制。为解决多智能体算法收敛困难的问题,训练过程中使用课程强化学习将任务进行阶段分解,针对每次任务不同,设计层次递进的奖励函数,并使用人工势场思想设计稠密奖励,使得训练难度大大降低。在自主搭建的软件在环(SITL)仿真环境中,通过消融、对照实验,验证了MADDPG算法在多智能体环境中的有效性和稳定性。最后进行实机实验,在现实环境中进一步验证了所设计算法的实用性。

关键词: 无人机编队, 深度强化学习, 多智能体深度确定性策略梯度, 课程学习, 神经网络

Abstract: The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is an extension of the Deep Deterministic Policy Gradient (DDPG) algorithm, specifically designed for multi-agent environments. In the MADDPG algorithm, each agent considers not only its own observations and actions but also the strategies of other agents to make more accurate collective decisions. This design significantly improves performance and stability in complex and changing environments. Based on the MADDPG algorithm framework, this study addressed the problem of Unmanned Aerial Vehicle (UAV) formation control. To overcome the challenge of convergence difficulty in multi-agent algorithms, a curriculum reinforcement learning approach was employed to train tasks in a stagewise manner. Progressively enhanced reward functions were designed for different tasks of each stage, and dense rewards were devised using the artificial potential field concept to significantly reduce the training difficulty. The effectiveness and stability of the MADDPG algorithm in multi-agent environments were demonstrated through ablation and control experiments performed in a self-built Software in the Loop (SITL) simulation environment. Furthermore, real-world experiments were conducted to verify the practicality of the designed algorithm.

Key words: Unmanned Aerial Vehicle (UAV) formation, deep reinforcement learning, Multi-Agent Deep Deterministic Policy Gradient (MADDPG), curriculum learning, neural network

中图分类号: