Path Following Method of Six-DOF Fixed-Wing UAV Based on Hierarchical Deep Reinforcement Learning

doi:10.19678/j.issn.1000-3428.0070197

Abstract

Abstract:

The path following mechanism of fixed-wing Unmanned Aerial Vehicles (UAVs) is crucial in the UAV domain. In the field of six-Degrees of Freedom (DOF) dynamics, the fixed-wing UAV is presented as a nonlinear system, wherein the high dimensions of its continuous state and action spaces make it challenging to control and guide. A novel hierarchical reinforcement learning framework is proposed to address the complex issues in fixed-wing UAV path following. The basis of this framework is to decompose path following into separate control and guidance problems. For the control problem, a Proximal Policy Optimization with Differential Compensator (PPO-DC) algorithm is introduced by incorporating a differential compensator, which demonstrates a faster convergence speed and control stability. Experimental results reveal that the proposed PPO-DC algorithm improves convergence speed by approximately 2.5 times compared to the standard PPO algorithm and achieves better control accuracy. Moreover, models trained for specific control tasks exhibit strong adaptability when handling other control tasks. For the guidance problem, the fixed-wing UAV guidance is modeled, and an effective guidance strategy is proposed. Additionally, a cumulative reward design is proposed to address the sequential learning of multiple objectives in reinforcement learning tasks, ensuring effective convergence of training. Experimental results show that the proposed hierarchical reinforcement learning framework performs exceptionally well in various complex path-following scenarios, maintaining an average path-following error of less than 20 meters for fixed-wing UAVs.

Key words: hierarchical reinforcement learning, fixed-wing Unmanned Aerial Vehicle(UAV), six-Degrees of Freedom (DOF), path following, UAV control, UAV guidance

摘要：

固定翼无人机(UAV)的路径跟踪问题是无人机领域中的重要问题。在六自由度(DOF)动力学领域中, 固定翼无人机是一种非线性系统, 其连续状态空间和连续动作空间的高维特征使得固定翼无人机难以控制和制导。构建一种新型的分层强化学习框架, 以解决固定翼无人机路径跟踪中的复杂问题。该框架的核心在于将路径跟踪问题分解为控制问题和制导问题。在控制方面, 通过引入微分补偿器提出一种基于微分补偿器的近端策略优化(PPO-DC)算法, 该算法具有更快的收敛速度以及更好的控制稳定性。实验证明, 提出的PPO-DC算法的收敛速度比PPO算法提升了约2.5倍并且具有更佳的控制精度。此外, 通过特定控制任务训练的模型在处理其他控制任务时同样具有很强的适应性。针对固定翼无人机建立制导模型, 并提出一种有效的制导策略, 解决了其制导问题, 同时提出一种累积奖励设计以解决强化学习任务中多个目标顺序学习的问题, 使得训练能够有效收敛。实验结果表明, 提出的分层强化学习框架在多种复杂路径跟踪场景中表现突出, 固定翼无人机路径跟踪平均误差保持在20 m以内。

关键词: 分层强化学习, 固定翼无人机, 六自由度, 路径跟踪, 无人机控制, 无人机制导

JIANG Taimin, TAN Tai, LI Hui, ZHANG Jianwei, HUA Chenhao, DONG Zhiqiang. Path Following Method of Six-DOF Fixed-Wing UAV Based on Hierarchical Deep Reinforcement Learning[J]. Computer Engineering, 2026, 52(4): 90-102.

江泰民, 谭泰, 李辉, 张建伟, 化晨昊, 董志强. 基于分层深度强化学习的六自由度固定翼无人机路径跟踪方法[J]. 计算机工程, 2026, 52(4): 90-102.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0070197

https://www.ecice06.com/EN/Y2026/V52/I4/90

Figures/Tables 13

Fig.1 Schematic diagram of the fixed-wing UAV attitude

Fig.2 Interaction process between the UAV and the environment at time t

Fig.3 Hierarchical framework for path following

Fig.4 Distribution of reward areas

Fig.5 Problem analysis of the fixed-wing UAV path following

Fig.6 Average reward curves for control strategy training

Fig.7 Control results performed by PID, PPO and PPO-DC

Fig.8 Average reward curve of guidance strategy training

Fig.9 Training results with different reward designs

Fig.10 Path following results of guidance strategy

References 42

1	CAI G W, LUM K Y, CHEN B M, et al. A brief overview on miniature fixed-wing unmanned aerial vehicles[C]//Proceedings of the IEEE ICCA 2010. Washington D.C., USA: IEEE Press, 2010: 285-290.
2	李永丰, 史静平, 章卫国, 等. 深度强化学习的无人作战飞机空战机动决策. 哈尔滨工业大学学报, 2021, 53 (12): 33- 41.
	LI Y F , SHI J P , ZHANG W G , et al. Maneuver decision of UCAV in air combat based on deep reinforcement learning. Journal of Harbin Institute of Technology, 2021, 53 (12): 33- 41.
3	SINGH A P , YERUDKAR A , MARIANI V , et al. A bibliometric review of the use of unmanned aerial vehicles in precision agriculture and precision viticulture for sensing applications. Remote Sensing, 2022, 14 (7): 1604. doi: 10.3390/rs14071604
4	GUERRA A, GUIDI F, DARDARI D, et al. Reinforcement learning for UAV autonomous navigation, mapping and target detection[C]//Proceedings of the IEEE/ION Position, Location and Navigation Symposium (PLANS). Washington D.C., USA: IEEE Press, 2020: 1004-1013.
5	ZIMROZ P , TRYBAŁA P , WRÓBLEWSKI A , et al. Application of UAV in search and rescue actions in underground mine—a specific sound detection in noisy acoustic signal. Energies, 2021, 14 (13): 3725. doi: 10.3390/en14133725
6	KUMAR S. A brief review on Unmanned Combat Aerial Vehicle (U.C.A. V)[EB/OL]. [2024-07-04]. https://www.researchgate.net/directory/publications.
7	TEMPLIN T , POPIELARCZYK D , KOSECKI R . Application of low-cost fixed-wing UAV for inland lakes shoreline investigation. Pure and Applied Geophysics, 2018, 175 (9): 3263- 3283. doi: 10.1007/s00024-017-1707-7
8	刘庆健, 疏利生, 刘刚, 等. 低空无人机路径规划算法综述. 航空工程进展, 2023, 14 (2): 24- 34.
	LIU Q J , SHU L S , LIU G , et al. A survey of low altitude UAV path planning algorithms. Advances in Aeronautical Science and Engineering, 2023, 14 (2): 24- 34.
9	KOLMANOVSKY I , MCCLAMROCH N H . Developments in nonholonomic control problems. IEEE Control Systems Magazine, 1995, 15 (6): 20- 36. doi: 10.1109/37.476384
10	徐军. 飞机自动飞行控制系统. 北京: 北京理工大学出版社, 2020.
	XU J . Aircraft automatic flight control system. Beijing: Beijing Insititute of Technology Press, 2020.
11	NELSON D R , BARBER D B , MCLAIN T W , et al. Vector field path following for miniature air vehicles. IEEE Transactions on Robotics, 2007, 23 (3): 519- 529. doi: 10.1109/TRO.2007.898976
12	ABOZIED M A H, QIN S Y. High performance path following for UAV based on advanced vector field guidance law[C]//Proceedings of the IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). Washington D.C., USA: IEEE Press, 2016: 555-564.
13	NIAN X H , ZHOU W X , LI S L , et al. 2-D path following for fixed wing UAV using global fast terminal sliding mode control. ISA Transactions, 2023, 136, 162- 172. doi: 10.1016/j.isatra.2022.11.027
14	YANG J , LIU C J , COOMBES M , et al. Optimal path following for small fixed-wing UAVs under wind disturbances. IEEE Transactions on Control Systems Technology, 2021, 29 (3): 996- 1008. doi: 10.1109/TCST.2020.2980727
15	ZHAO S L , WANG X K , ZHANG D B , et al. Curved path following control for fixed-wing unmanned aerial vehicles with control constraint. Journal of Intelligent & Robotic Systems, 2018, 89 (1): 107- 119.
16	RHEE I, PARK S, RYOO C K. A tight path following algorithm of an UAS based on PID control[C]//Proceedings of the SICE Annual Conference 2010. Washington D.C., USA: IEEE Press, 2010: 1270-1273.
17	BEARD R W , MCLAIN T W . Small unmanned aircraft: theory and practice. [S. l.]: Princeton University Press, 2012.
18	IBRAHEEM G A , IBRAHEEM I K . Motion control of an autonomous mobile robot using modified particle swarm optimization based fractional order PID controller. Engineering and Technology Journal, 2016, 34 (13): 2406- 2419. doi: 10.30684/etj.34.13A.4
19	NAJM A A , IBRAHEEM I K . Nonlinear PID controller design for a 6-DOF UAV quadrotor system. Engineering Science and Technology, an International Journal, 2019, 22 (4): 1087- 1097. doi: 10.1016/j.jestch.2019.02.005
20	REINHARDT D, GROS S, JOHANSEN T A. Fixed-wing UAV path-following control via NMPC on the lowest level[C]//Proceedings of the IEEE Conference on Control Technology and Applications (CCTA). Washington D.C., USA: IEEE Press, 2023: 451-458.
21	MNIH V , KAVUKCUOGLU K , SILVER D , et al. Human-level control through deep reinforcement learning. Nature, 2015, 518, 529- 533. doi: 10.1038/nature14236
22	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2016: 1-8.
23	SILVER D , SCHRITTWIESER J , SIMONYAN K , et al. Mastering the game of go without human knowledge. Nature, 2017, 550, 354- 359. doi: 10.1038/nature24270
24	TORRADO R R, BONTRAGER P, TOGELIUS J, et al. Deep reinforcement learning for general video game AI[C]//Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG). Washington D.C., USA: IEEE Press, 2018: 1-8.
25	HAN L, SUN P, DU Y L, et al. Grid-wise control for multi-agent reinforcement learning in video game AI[C]//Proceedings of the International Conference on Machine Learning. [S. l.]: PMLR, 2019: 2576-2585.
26	LUKETINA J, NARDELLI N, FARQUHAR G, et al. A survey of reinforcement learning informed by natural language[EB/OL]. [2024-07-04]. https://arxiv.org/abs/1906.03926.
27	KOCOŇ J , CICHECKI I , KASZYCA O , et al. ChatGPT: jack of all trades, master of none. Information Fusion, 2023, 99, 101861. doi: 10.1016/j.inffus.2023.101861
28	DULAC-ARNOLD G, EVANS R, VAN HASSELT H, et al. Deep reinforcement learning in large discrete action spaces[EB/OL]. [2024-07-04]. https://arxiv.org/abs/1512.07679v2.
29	SANKAR C, RAVI S. Deep reinforcement learning for modeling chit-chat dialog with discrete attributes[EB/OL]. [2024-07-04]. https://arxiv.org/abs/1907.02848.
30	MARCHESINI E, FARINELLI A. Discrete deep reinforcement learning for mapless navigation[C]//Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Washington D.C., USA: IEEE Press, 2020: 10688-10694.
31	LILLICRAP T, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. [2024-07-04]. https://arxiv.org/abs/1509.02971.
32	RECHT B . A tour of reinforcement learning: the view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems, 2019, 2, 253- 279. doi: 10.1146/annurev-control-053018-023825
33	WOŁCZYK M , ZAJAC M , PASCANU R , et al. Continual world: a robotic benchmark for continual reinforcement learning. Advances in Neural Information Processing Systems, 2021, 34, 28496- 28510.
34	KUMAR S. Balancing a CartPole system with reinforcement learning—a tutorial[EB/OL]. [2024-07-04]. https://arxiv.org/abs/2006.04938.
35	NOWAKOWSKI K , CARVALHO P , SIX J B , et al. Human locomotion with reinforcement learning using bioinspired reward reshaping strategies. Medical & Biological Engineering & Computing, 2021, 59 (1): 243- 256.
36	FAYJIE A R, HOSSAIN S, OUALID D, et al. Driverless car: autonomous driving using deep reinforcement learning in urban environment[C]//Proceedings of the 15th International Conference on Ubiquitous Robots (UR). Washington D.C., USA: IEEE Press, 2018: 896-901.
37	黄志清, 曲志伟, 张吉, 等. 基于深度强化学习的端到端无人驾驶决策. 电子学报, 2020, 48 (9): 1711- 1719.
	HUANG Z Q , QU Z W , ZHANG J , et al. End-to-end autonomous driving decision based on deep reinforcement learning. Acta Electronica Sinica, 2020, 48 (9): 1711- 1719.
38	PHAM H X, LA H M, FEIL-SEIFER D, et al. Reinforcement learning for autonomous UAV navigation using function approximation[C]//Proceedings of the IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). Washington D.C., USA: IEEE Press, 2018: 1-6.
39	SUN Y S , RAN X R , ZHANG G C , et al. AUV path following controlled by modified deep deterministic policy gradient. Ocean Engineering, 2020, 210, 107360. doi: 10.1016/j.oceaneng.2020.107360
40	UD DIN A F , MIR I , GUL F , et al. Deep reinforcement learning for integrated non-linear control of autonomous UAVs. Processes, 2022, 10 (7): 1307. doi: 10.3390/pr10071307
41	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2024-07-04]. https://arxiv.org/abs/1707.06347.
42	BERNDT J. JSBSim: an open source flight dynamics model in C++[C]//Proceedings of the AIAA Modeling and Simulation Technologies Conference and Exhibit. Providence, USA: AIAA Press, 2004: 4923.

[1]	QI Mingkai, WANG Di, ZHANG Liye. Online 3D Bin Packing Model Based on Hierarchical Reinforcement Learning [J]. Computer Engineering, 2025, 51(6): 136-145.
[2]	DAI Dong,WANG Guo,WANG Lei. Application of Game Theory in Ground Target Tracking Control Using Fixed-wing Unmanned Aerial Vehicle [J]. Computer Engineering, 2016, 42(7): 287-292,298.
[3]	YANG Zhao,WANG Jian-hua,WU Yu-ping. Straight Line Path Following of Unmanned Surface Vessel Based on Fuzzy PID [J]. Computer Engineering, 2014, 40(10): 270-274,280.

Please choose a citation manager

Content to export