Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (4): 90-102. doi: 10.19678/j.issn.1000-3428.0070197

• Computational Intelligence and Pattern Recognition • Previous Articles     Next Articles

Path Following Method of Six-DOF Fixed-Wing UAV Based on Hierarchical Deep Reinforcement Learning

JIANG Taimin1, TAN Tai1, LI Hui1,2, ZHANG Jianwei1,2,*(), HUA Chenhao1, DONG Zhiqiang3   

  1. 1. College of Computer Science, Sichuan University, Chengdu 610065, Sichuan, China
    2. National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, Sichuan, China
    3. System Engineering Research Institute, China State Shipbuilding Corporation Limited, Beijing 100094, China
  • Received:2024-08-05 Revised:2024-09-26 Online:2026-04-15 Published:2024-12-10
  • Contact: ZHANG Jianwei

基于分层深度强化学习的六自由度固定翼无人机路径跟踪方法

江泰民1, 谭泰1, 李辉1,2, 张建伟1,2,*(), 化晨昊1, 董志强3   

  1. 1. 四川大学计算机学院, 四川 成都 610065
    2. 四川大学视觉合成图形图像技术国家级重点实验室, 四川 成都 610065
    3. 中国船舶集团有限公司系统工程研究院, 北京 100094
  • 通讯作者: 张建伟
  • 作者简介:

    江泰民, 男, 硕士研究生, 主研方向为强化学习

    谭泰, 硕士研究生

    李辉, 教授

    张建伟(通信作者), 研究员、博士生导师

    化晨昊, 硕士研究生

    董志强, 高级工程师

  • 基金资助:
    国家自然科学基金(U20A20161)

Abstract:

The path following mechanism of fixed-wing Unmanned Aerial Vehicles (UAVs) is crucial in the UAV domain. In the field of six-Degrees of Freedom (DOF) dynamics, the fixed-wing UAV is presented as a nonlinear system, wherein the high dimensions of its continuous state and action spaces make it challenging to control and guide. A novel hierarchical reinforcement learning framework is proposed to address the complex issues in fixed-wing UAV path following. The basis of this framework is to decompose path following into separate control and guidance problems. For the control problem, a Proximal Policy Optimization with Differential Compensator (PPO-DC) algorithm is introduced by incorporating a differential compensator, which demonstrates a faster convergence speed and control stability. Experimental results reveal that the proposed PPO-DC algorithm improves convergence speed by approximately 2.5 times compared to the standard PPO algorithm and achieves better control accuracy. Moreover, models trained for specific control tasks exhibit strong adaptability when handling other control tasks. For the guidance problem, the fixed-wing UAV guidance is modeled, and an effective guidance strategy is proposed. Additionally, a cumulative reward design is proposed to address the sequential learning of multiple objectives in reinforcement learning tasks, ensuring effective convergence of training. Experimental results show that the proposed hierarchical reinforcement learning framework performs exceptionally well in various complex path-following scenarios, maintaining an average path-following error of less than 20 meters for fixed-wing UAVs.

Key words: hierarchical reinforcement learning, fixed-wing Unmanned Aerial Vehicle(UAV), six-Degrees of Freedom (DOF), path following, UAV control, UAV guidance

摘要:

固定翼无人机(UAV)的路径跟踪问题是无人机领域中的重要问题。在六自由度(DOF)动力学领域中, 固定翼无人机是一种非线性系统, 其连续状态空间和连续动作空间的高维特征使得固定翼无人机难以控制和制导。构建一种新型的分层强化学习框架, 以解决固定翼无人机路径跟踪中的复杂问题。该框架的核心在于将路径跟踪问题分解为控制问题和制导问题。在控制方面, 通过引入微分补偿器提出一种基于微分补偿器的近端策略优化(PPO-DC)算法, 该算法具有更快的收敛速度以及更好的控制稳定性。实验证明, 提出的PPO-DC算法的收敛速度比PPO算法提升了约2.5倍并且具有更佳的控制精度。此外, 通过特定控制任务训练的模型在处理其他控制任务时同样具有很强的适应性。针对固定翼无人机建立制导模型, 并提出一种有效的制导策略, 解决了其制导问题, 同时提出一种累积奖励设计以解决强化学习任务中多个目标顺序学习的问题, 使得训练能够有效收敛。实验结果表明, 提出的分层强化学习框架在多种复杂路径跟踪场景中表现突出, 固定翼无人机路径跟踪平均误差保持在20 m以内。

关键词: 分层强化学习, 固定翼无人机, 六自由度, 路径跟踪, 无人机控制, 无人机制导