Unmanned Aerial Vehicle Formation Control Based on MADDPG with Integrated Curriculum Learning

doi:10.19678/j.issn.1000-3428.0069850

Abstract

Abstract:

The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is an extension of the Deep Deterministic Policy Gradient (DDPG) algorithm, specifically designed for multi-agent environments. In the MADDPG algorithm, each agent considers not only its own observations and actions but also the strategies of other agents to make more accurate collective decisions. This design significantly improves performance and stability in complex and changing environments. Based on the MADDPG algorithm framework, this study addressed the problem of Unmanned Aerial Vehicle (UAV) formation control. To overcome the challenge of convergence difficulty in multi-agent algorithms, a curriculum reinforcement learning approach was employed to train tasks in a stagewise manner. Progressively enhanced reward functions were designed for different tasks of each stage, and dense rewards were devised using the artificial potential field concept to significantly reduce the training difficulty. The effectiveness and stability of the MADDPG algorithm in multi-agent environments were demonstrated through ablation and control experiments performed in a self-built Software in the Loop (SITL) simulation environment. Furthermore, real-world experiments were conducted to verify the practicality of the designed algorithm.

Key words: Unmanned Aerial Vehicle (UAV) formation, deep reinforcement learning, Multi-Agent Deep Deterministic Policy Gradient (MADDPG), curriculum learning, neural network

摘要：

多智能体深度确定性梯度(MADDPG)算法由深度确定性策略梯度(DDPG)算法扩展而来, 专门针对多智能体环境设计, 算法中每个智能体不仅考虑自身的观察和行动, 还考虑其他智能体的策略, 以更好地进行集体决策, 这种设计显著提升了其在复杂、多变的环境中的性能和稳定性。基于MADDPG算法框架, 设计算法的网络结构、状态空间、动作空间和奖励函数, 实现无人机编队控制。为解决多智能体算法收敛困难的问题, 训练过程中使用课程强化学习将任务进行阶段分解, 针对每次任务不同, 设计层次递进的奖励函数, 并使用人工势场思想设计稠密奖励, 使得训练难度大大降低。在自主搭建的软件在环(SITL)仿真环境中, 通过消融、对照实验, 验证了MADDPG算法在多智能体环境中的有效性和稳定性。最后进行实机实验, 在现实环境中进一步验证了所设计算法的实用性。

关键词: 无人机编队, 深度强化学习, 多智能体深度确定性策略梯度, 课程学习, 神经网络

WU Kaifeng, LIU Lei, LIU Chen, LIANG Chengqing. Unmanned Aerial Vehicle Formation Control Based on MADDPG with Integrated Curriculum Learning[J]. Computer Engineering, 2025, 51(5): 73-82.

吴凯峰, 刘磊, 刘晨, 梁成庆. 基于融合课程思想MADDPG的无人机编队控制[J]. 计算机工程, 2025, 51(5): 73-82.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069850

https://www.ecice06.com/EN/Y2025/V51/I5/73

Figures/Tables 14

Fig.1 Schematic diagram of UAV formation

Fig.2 Structure of the DDPG algorithm

Fig.3 Structure of the MADDPG algorithm

Fig.4 Experimental environment

Fig.5 Initial and target positions of the UAV

Fig.6 Training reward graphs of MADDPG using curriculum reinforcement learning

Fig.7 Training reward graph of MADDPG without using curriculum reinforcement learning

Fig.8 Trajectory diagram of three UAVs

Fig.9 Reward graphs for different discount factors

Fig.10 Reward graphs for different learning rates

Fig.11 Reward graphs for different target points

Fig.12 UAV platform

Fig.13 Real aircraft formation effect

References 26

1	何畅. 基于智能优化算法的多无人机联合搜救路径规划与通信覆盖研究[D]. 广州: 广州大学, 2024.
	HE C. Research on path planning and communication coverage of multi-UAV joint search and rescue based on intelligent optimization algorithm[D]. Guangzhou: Guangzhou University, 2024. (in Chinese)
2	苏立晨, 赵浩然, 郭通, 等. 基于动态分治的大规模多场站无人机应急救援优化方法. 北京邮电大学学报, 2024, 47 (1): 65- 71.
	SU L C , ZHAO H R , GUO T , et al. Optimization method for large-scale multi-site unmanned aerial vehicle emergency rescue based on dynamic divide-and-conquer strategy. Journal of Beijing University of Posts and Telecommunications, 2024, 47 (1): 65- 71.
3	寇昆湖, 王雅平, 尚在飞. 对海作战无人机指挥控制系统及协同模式研究. 火力与指挥控制, 2024, 49 (6): 68- 74.
	KOU K H , WANG Y P , SHANG Z F . Research on command and control system and cooperative mode of UAVs for sea operations. Fire Control & Command Control, 2024, 49 (6): 68- 74.
4	DESAI J P, OSTROWSKI J, KUMAR V. Controlling formations of multiple mobile robots[C]// Proceedings of the 15th IEEE International Conference on Robotics and Automation. Washington D. C., USA: IEEE Press, 1998: 2864-2869.
5	彭建帅, 付兴建. 仿雁群行为的领航-跟随无人机编队控制. 控制工程, 2023, 30 (1): 113- 118.
	PENG J S , FU X J . Formation control of leader-follower UAV based on the behavior of geese swarm. Control Engineering of China, 2023, 30 (1): 113- 118.
6	冯一飞. 基于行为法的分布式无人机集群控制方法与仿真研究[D]. 长春: 吉林大学, 2023.
	FENG Y F. Research on control method and simulation of distributed UAV cluster based on behavior method[D]. Changchun: Jilin University, 2023. (in Chinese)
7	李正平, 鲜斌. 基于虚拟结构法的分布式多无人机鲁棒编队控制. 控制理论与应用, 2020, 37 (11): 2423- 2431.
	LI Z P , XIAN B . Robust distributed formation control of multiple unmanned aerial vehicles based on virtual structure. Control Theory & Applications, 2020, 37 (11): 2423- 2431.
8	OLFATI-SABER R , MURRAY R M . Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, 2004, 49 (9): 1520- 1533. doi: 10.1109/TAC.2004.834113
9	于跃飞, 林国怀, 郭子杰, 等. 基于固定时间的多无人机系统自适应姿态控制. 聊城大学学报(自然科学版), 2023, 36 (1): 11- 23.
	YU Y F , LIN G H , GUO Z J , et al. Fixed-time-based adaptive attitude control for multi-UAV systems. Journal of Liaocheng University (Natural Science Edition), 2023, 36 (1): 11- 23.
10	HUANG Z B , SUN S L , ZHAO J , et al. Multi-modal policy fusion for end-to-end autonomous driving. Information Fusion, 2023, 98, 101834. doi: 10.1016/j.inffus.2023.101834
11	何逸煦, 林泓熠, 刘洋, 等. 强化学习在自动驾驶技术中的应用与挑战. 同济大学学报(自然科学版), 2024, 52 (4): 520- 531.
	HE Y X , LIN H Y , LIU Y , et al. Applications and challenges of reinforcement learning in autonomous driving technology. Journal of Tongji University (Natural Science), 2024, 52 (4): 520- 531.
12	刘勇, 徐雷, 张楚晗. 面向文本游戏的深度强化学习模型. 吉林大学学报(工学版), 2022, 52 (3): 666- 674.
	LIU Y , XU L , ZHANG C H . Deep reinforcement learning model for text games. Journal of Jilin University (Engineering and Technology Edition), 2022, 52 (3): 666- 674.
13	牛润良. 基于强化学习的Transformer模型解释与对抗攻击研究[D]. 长春: 吉林大学, 2022.
	NIU R L. Research on transformer model interpretation and counterattack based on reinforcement learning[D]. Changchun: Jilin University, 2022. (in Chinese)
14	孔繁骏. 基于强化学习的智能服务机器人控制系统. 物联网技术, 2023, 13 (5): 77- 79.
	KONG F J . Control system of intelligent service robot based on reinforcement learning. Internet of Things Technologies, 2023, 13 (5): 77- 79.
15	SINGH B , KUMAR R , SINGH V P . Reinforcement learning in robotic applications: a comprehensive survey. Artificial Intelligence Review, 2022, 55 (2): 945- 990. doi: 10.1007/s10462-021-09997-9
16	HUNG S M , GIVIGI S N . A Q-learning approach to flocking with UAVs in a stochastic environment. IEEE Transactions on Cybernetics, 2017, 47 (1): 186- 197. doi: 10.1109/TCYB.2015.2509646
17	HUNG S M, GIVIGI S N, NOURELDIN A. A Dyna-Q (lambda) approach to flocking with fixed-wing UAVs in a stochastic environment[C]//Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Washington D. C., USA: IEEE Press, 2015: 1918-1923.
18	WANG C, YAN C, XIANG X, et al. A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs[C]//Proceedings of Asian Conference on Machine Learning. Nagoya, Japan: PMLR, 2019: 64-79.
19	WANG C , WANG J , SHEN Y , et al. Autonomous navigation of UAVs in large-scale complex environments: a deep reinforcement learning approach. IEEE Transactions on Vehicular Technology, 2019, 68 (3): 2124- 2136. doi: 10.1109/TVT.2018.2890773
20	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6382-6393.
21	ZHANG X P , ZHENG Y P , WANG L , et al. Multi-agent collaborative target search based on the multi-agent deep deterministic policy gradient with emotional intrinsic motivation. Applied Sciences, 2023, 13 (21): 11951. doi: 10.3390/app132111951
22	李波, 越凯强, 甘志刚, 等. 基于MADDPG的多无人机协同任务决策. 宇航学报, 2021, 42 (6): 757- 765.
	LI B , YUE K Q , GAN Z G , et al. Multi-UAV cooperative autonomous navigation based on multi-agent deep deterministic policy gradient. Journal of Astronautics, 2021, 42 (6): 757- 765.
23	NARVEKAR S , PENG B , LEONETTI M , et al. Curriculum learning for reinforcement learning domains: a framework and survey. Journal of Machine Learning Research, 2020, 21 (1): 7382- 7431.
24	陈人龙, 陈嘉礼, 李善琦, 等. 多智能体强化学习方法综述. 信息对抗技术, 2024, 3 (1): 18- 32.
	CHEN R L , CHEN J L , LI S Q , et al. A survey of multi-agent reinforcement learning methods. Information Countermeasure Technology, 2024, 3 (1): 18- 32.
25	XIAO C , LU P , HE Q . Flying through a narrow gap using end-to-end deep reinforcement learning augmented with curriculum learning and Sim2Real. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34 (5): 2701- 2708. doi: 10.1109/TNNLS.2021.3107742
26	罗睿卿, 曾坤, 张欣景. 稀疏异质多智能体环境下基于强化学习的课程学习框架. 计算机科学, 2024, 51 (1): 301- 309.
	LUO R Q , ZENG K , ZHANG X J . Curriculum learning framework based on reinforcement learning in sparse heterogeneous multi-agent environments. Computer Science, 2024, 51 (1): 301- 309.

[1]	Lü Chaofeng, XU Pengfei, LUO Di, LIU Jinping. SD-IoT Controller Placement Based on Multi-Agent Deep Reinforcement Learning [J]. Computer Engineering, 2025, 51(5): 83-92.
[2]	YAO Xun, WANG Haipeng, HU Xinrong, YANG Jie. Multi-view Contrastive Recommendation Algorithm Based on Adaptive Enhancement [J]. Computer Engineering, 2025, 51(5): 103-113.
[3]	HAO Zhifeng, LI Yanglin, XU Boyan, CAI Ruichu. Hypergraph Neural Networks for Cross-domain Text-to-SQL [J]. Computer Engineering, 2025, 51(5): 114-123.
[4]	LIU Wenjie, CHEN Liang, REN Zhijie. Few-shot Relation Reasoning Model Based on Graph Neural Network and Meta-Learning [J]. Computer Engineering, 2025, 51(5): 124-132.
[5]	HUANG Yao, CHAI Zhilei. Communication and Topology-Aware Partitioning and Mapping Algorithm for SNN [J]. Computer Engineering, 2025, 51(5): 219-228.
[6]	GUO Peilin, ZHANG De, WANG Huaixiu. Exploring the Impact of Skip Connection Structures on the Deep Neural Networks Feature Extraction Based on Feature Visualization [J]. Computer Engineering, 2025, 51(4): 149-157.
[7]	YANG Ping, ZHANG Xi. Improved DeepLabv3+ Road Surface Crack Detection Method [J]. Computer Engineering, 2025, 51(4): 261-270.
[8]	LIU Yunxiang, LIANG Zhichao. A Highly Efficient Traffic Prediction Model for Continuous Time-series Graph Attention Networks [J]. Computer Engineering, 2025, 51(4): 350-359.
[9]	ZHANG Zhaoxin, HUANG Shize, ZHANG Bingjie, SHEN Tuo. Camouflaged Adversarial Example Generation Method for the Form of Motion Blur in Traffic Scenes [J]. Computer Engineering, 2025, 51(3): 45-53.
[10]	HU Shulin, ZHANG Huajun, DENG Xiaotao, WANG Zhenghua. Similarity Calculation for Chinese Text Based on Dependency Graph Convolution [J]. Computer Engineering, 2025, 51(3): 76-85.
[11]	CAI Ruichu, XU Zunhong, CHEN Daoxin, YANG Zhenhui, LI Zijian, HAO Zhifeng. Causal Mechanism-Based Molecular Property Prediction [J]. Computer Engineering, 2025, 51(3): 105-112.
[12]	LI Siyuan, ZHONG Xingyu, LI Kaiyin, XU Qingzhen. Strategy Teaching Research Based on Multilayer Graph Relationship and Reinforcement Learning [J]. Computer Engineering, 2025, 51(3): 122-130.
[13]	LIU Chunyu, CHEN Qingfeng, MO Shaocong, XIE Ze. Knowledge Graph Completion Based on Logical Rules and Graph Neural Network [J]. Computer Engineering, 2025, 51(3): 131-143.
[14]	LIN Shaofu, CHEN Yingying, LI Shuopeng. Method of Joint Optimization for Multi-UAV Energy Transfer and Edge Computing Based on Deep Reinforcement Learning [J]. Computer Engineering, 2025, 51(3): 144-154.
[15]	CHEN Depin, ZHAO Shen, JIAO Yiping, WANG Xiangxue, LÜ Hong, XU Jun. Graph Construction on Whole Image Slides Based on Attention and Learnable Threshold [J]. Computer Engineering, 2025, 51(3): 229-240.

Please choose a citation manager

Content to export