Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

UAV Path Planning with Local Information-Enhanced Proximal Policy Optimization Algorithm

  

  • Published:2026-04-20

基于局部信息增强近端策略优化算法的UAV路径规划

Abstract: The autonomous path planning of drones is the key to ensuring the success of the missions in complex environments, requiring it to be able to plan both globally efficient flight paths and respond to changes in local environments. In the initial static environment, complete planning for different combinations of starting and ending points, while adjusting obstacle avoidance in local areas, requires an effective balance between global path optimality and local obstacle avoidance capabilities. Existing heuristic algorithms exhibit an exponential growth in search time with spatial resolution in complex three-dimensional environments, making it difficult to meet real-time requirements. On the other hand, gradient-based deep reinforcement learning methods often encounter the "perception aliasing" problem when dealing with unstructured mountainous terrain due to the lack of local perception guidance, leading to unstable training convergence and susceptibility to local extremum traps. A proximal policy optimization algorithm based on local information enhancement (LIE-PPO) is proposed, and a state space integrating global position information, relative target information, and a local perception window is designed to enable the agent to balance long-term planning and local decision-making, thereby addressing path planning problems in high-dimensional feature spaces. For the path planning problem, the algorithm adopts a 26 neighborhood discrete action space and designs a multi-objective reward function that comprehensively considers path smoothness, safety, and efficiency. This guides the agent to learn an efficient safe path selection strategy, enabling the online generation of feasible and optimal paths between any given start and end points based on a pre-trained model. The experimental results show that, over multiple tests with random start and end points, the proposed algorithm has an approximate global optimality with an average path length difference of less than 7% compared to the results of the A* algorithm in a static environment; Compared to the standard proximal policy optimization algorithm, the convergence speed has been improved by approximately 1.6 times, demonstrating faster convergence speed and higher training stability. In the presence of unknown obstacles, feasible paths can still be planned, demonstrating good environmental adaptability.

摘要: 无人机的自主路径规划是确保其复杂环境下任务成功的关键,要求其既能规划出全局高效的飞行路径,又能应对局部环境的变化。在初始静态环境下为不同起终点组合进行完整规划,同时在局部区域内进行避障调整需要有效权衡全局路径最优性与局部避障能力。现有启发式算法在三维复杂环境下的搜索时间随空间分辨率呈指数级增长,难以满足实时性需求;而基于梯度的深度强化学习方法在处理非结构化山地地形时,常因缺乏局部感知引导而面临“感知混叠”问题,导致训练收敛不稳定且易陷入局部极值陷阱。提出一种基于局部信息增强的近端策略优化算法(LIE-PPO),设计融合全局位置、目标点相对信息及局部感知窗口的状态空间,使智能体能够同时兼顾长远规划与近端决策,以解决高维特征下路径规划问题。针对路径规划问题,算法采用26邻域离散动作空间,设计综合考虑路径离散曲率、安全性与效率的多目标奖励函数,引导智能体学习高效安全路径选择策略,基于预训练模型在线快速生成任意起终点间可行最优路径。实验结果表明,在进行多次随机起终点测试后,提出的算法在静态环境中规划的平均路径长度与A*算法结果相比,差距小于7%,具有近似全局最优性;相较于标准近端策略优化算法收敛速度提升约1.6倍,展现出更快的收敛速度和更高的训练稳定性。在存在未知障碍物的场景下,仍能规划出可行路径,表现出良好的环境适应性。