UAV Path Planning with Local Information-Enhanced Proximal Policy Optimization Algorithm

doi:10.19678/j.issn.1000-3428.0253268

Abstract

Abstract: The autonomous path planning of drones is the key to ensuring the success of the missions in complex environments, requiring it to be able to plan both globally efficient flight paths and respond to changes in local environments. In the initial static environment, complete planning for different combinations of starting and ending points, while adjusting obstacle avoidance in local areas, requires an effective balance between global path optimality and local obstacle avoidance capabilities. Existing heuristic algorithms exhibit an exponential growth in search time with spatial resolution in complex three-dimensional environments, making it difficult to meet real-time requirements. On the other hand, gradient-based deep reinforcement learning methods often encounter the "perception aliasing" problem when dealing with unstructured mountainous terrain due to the lack of local perception guidance, leading to unstable training convergence and susceptibility to local extremum traps. A proximal policy optimization algorithm based on local information enhancement (LIE-PPO) is proposed, and a state space integrating global position information, relative target information, and a local perception window is designed to enable the agent to balance long-term planning and local decision-making, thereby addressing path planning problems in high-dimensional feature spaces. For the path planning problem, the algorithm adopts a 26 neighborhood discrete action space and designs a multi-objective reward function that comprehensively considers path smoothness, safety, and efficiency. This guides the agent to learn an efficient safe path selection strategy, enabling the online generation of feasible and optimal paths between any given start and end points based on a pre-trained model. The experimental results show that, over multiple tests with random start and end points, the proposed algorithm has an approximate global optimality with an average path length difference of less than 7% compared to the results of the A* algorithm in a static environment; Compared to the standard proximal policy optimization algorithm, the convergence speed has been improved by approximately 1.6 times, demonstrating faster convergence speed and higher training stability. In the presence of unknown obstacles, feasible paths can still be planned, demonstrating good environmental adaptability.

摘要： 无人机的自主路径规划是确保其复杂环境下任务成功的关键，要求其既能规划出全局高效的飞行路径，又能应对局部环境的变化。在初始静态环境下为不同起终点组合进行完整规划，同时在局部区域内进行避障调整需要有效权衡全局路径最优性与局部避障能力。现有启发式算法在三维复杂环境下的搜索时间随空间分辨率呈指数级增长，难以满足实时性需求；而基于梯度的深度强化学习方法在处理非结构化山地地形时，常因缺乏局部感知引导而面临“感知混叠”问题，导致训练收敛不稳定且易陷入局部极值陷阱。提出一种基于局部信息增强的近端策略优化算法(LIE-PPO)，设计融合全局位置、目标点相对信息及局部感知窗口的状态空间，使智能体能够同时兼顾长远规划与近端决策，以解决高维特征下路径规划问题。针对路径规划问题，算法采用26邻域离散动作空间，设计综合考虑路径离散曲率、安全性与效率的多目标奖励函数，引导智能体学习高效安全路径选择策略，基于预训练模型在线快速生成任意起终点间可行最优路径。实验结果表明，在进行多次随机起终点测试后，提出的算法在静态环境中规划的平均路径长度与A*算法结果相比，差距小于7%，具有近似全局最优性；相较于标准近端策略优化算法收敛速度提升约1.6倍，展现出更快的收敛速度和更高的训练稳定性。在存在未知障碍物的场景下，仍能规划出可行路径，表现出良好的环境适应性。

Yuxin LIU, Hui LI, Jianwei ZHANG. UAV Path Planning with Local Information-Enhanced Proximal Policy Optimization Algorithm[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253268.

刘玉欣, 李辉, 张建伟. 基于局部信息增强近端策略优化算法的UAV路径规划[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253268.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0253268

References

[1] MAZAHERI H, GOLI S, NOUROLLAH A. A survey of 3D space path-planning methods and algorithms[J]. ACM Computing Surveys, 2024, 57(1): 1:1-1:32. DOI: 10.1145/3673896
[2] LIU Y, ZHANG X, GUAN X, et al. Adaptive sensitivity decision based path planning algorithm for unmanned aerial vehicle with improved particle swarm optimization[J]. Aerospace Science and Technology, 2016, 58: 92-102. DOI: 10.1016/j.ast.2016.08.017
[3] SÁNCHEZ-IBÁÑEZ J R, PÉREZ-DEL-PULGAR C J, GARCÍA-CEREZO A. Path planning for autonomous mobile robots: A review[J]. Sensors, 2021, 21(23): 7898. DOI: 10.3390/s21237898
[4] SUN J, LI Z, WANG B. Improved A-STAR algorithm for power line inspection UAV path planning[J]. Energies, 2024, 17(21): 5364. DOI: 10.3390/en17215364
[5] AIT SAADI A, SOUKANE A, MERAIHI Y, et al. UAV path planning using optimization approaches: A survey[J]. Archives of Computational Methods in Engineering, 2022, 29(6): 4233-4284. DOI: 10.1007/s11831-022-09742-7
[6] HARABOR D, GRASTIEN A. The JPS pathfinding system[J]. Proceedings of the International Symposium on Combinatorial Search, 2012, 3(1): 207-208.DOI: 10.1609/socs.v3i1.18254
[7] 裴以建, 杨超杰, 杨亮亮. 基于改进RRT*的移动机器人路径规划算法[J]. 计算机工程, 2019, 45(5): 285-290. PEI Y J, YANG C J, YANG L L. Mobile robot path planning algorithm based on improved RRT*[J]. Computer Engineering, 2019, 45(5): 285-290. DOI: 10.19678/j.issn.1000-3428.0049692
[8] CHEN J, YU J. An improved path planning algorithm for UAV based on RRT[C]//2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). 2021: 895-898 DOI: 10.1109/AEMCSE51986.2021.00182
[9] ZHANG J, LI J, YANG H, et al. Complex environment path planning for unmanned aerial vehicles[J]. Sensors, 2021, 21(15): 5250. DOI: 10.3390/s21155250
[10] FAN J, CHEN X, WANG Y, et al. UAV trajectory planning in cluttered environments based on PF-RRT* algorithm with goal-biased strategy[J]. Engineering Applications of Artificial Intelligence, 2022, 114: 105182. DOI: 10.1016/j.engappai.2022.105182
[11] ZHANG Z, JIANG J, WU J, et al. Efficient and optimal penetration path planning for stealth unmanned aerial vehicle using minimal radar cross-section tactics and modified A-Star algorithm[J]. ISA Transactions, 2023, 134: 42-57. DOI: 10.1016/j.isatra.2022.07.032
[12] 刘军, 冯硕, 任建华. 移动机器人路径动态规划有向D算法[J]. 浙江大学学报(工学版), 2020, 54(2): 291-300. LIU J, FENG S, REN J H. Dynamic path planning with directed D algorithm for mobile robot[J]. Journal of Zhejiang University (Engineering Science), 2020, 54(2): 291-300. DOI: 10.3785/j.issn.1008-973X.2020.02.010
[13] SARANYA C, UNNIKRISHNAN M, ALI S A, et al. Terrain based D* algorithm for path planning[J]. IFAC-PapersOnLine, 2016, 49(1): 178-182. DOI: 10.1016/j.ifacol.2016.03.049
[14] ZHOU Q, LIU G. UAV path planning based on the combination of a-star algorithm and RRT-star algorithm[C]//2022 IEEE International Conference on Unmanned Systems (ICUS). 2022: 146-151.DOI: 10.1109/icus55513.2022.9986703
[15] 刘全, 翟建伟, 章宗长, 等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1): 1-27. LIU Q, ZHAI J W, ZHANG Z C, et al. A review of deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1): 1-27. DOI: 10.11897/SP.J.1016.2018.00001
[16] KARGIN T C, KOŁOTA J. A reinforcement learning approach for continuum robot control[J]. Journal of Intelligent & Robotic Systems, 2023, 109: 77. DOI: 10.1007/s10846-023-02003-0
[17] THALAGALA S, WONG P K, WANG X, et al. Broad critic deep actor reinforcement learning for continuous control[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(9): 17508-17515. DOI: 10.1109/TNNLS.2025.3554082
[18] ZHU Y, WAN HASAN W Z, et al. Deep reinforcement learning of mobile robot navigation in dynamic environment: A review[J]. Sensors, 2025, 25(11): 3394. DOI: 10.3390/s25113394
[19] SIBOO S, BHATTACHARYYA A, RAJ R N, et al. An empirical study of DDPG and PPO-based reinforcement learning algorithms for autonomous driving[J]. IEEE Access, 2023, 11: 125094-125108. DOI: 10.1109/ACCESS.2023.3330665
[20] YANG H, ZHANG J, LIU X, et al. ST-D3QN: Advancing UAV path planning with an enhanced deep reinforcement learning framework in ultra-low altitudes[J]. IEEE Access, 2025, 13: 1-15. DOI: 10.1109/ACCESS.2025.3559129
[21] YIN J, RAO W, XIAO Y, et al. Cooperative path planning with asynchronous multiagent reinforcement learning[J]. IEEE Transactions on Mobile Computing, 2025, 24(6): 5016-5030. DOI: 10.1109/TMC.2025.3526979
[22] DUAN Y, CHEN X, HOUTHOOFT R, et al. Benchmarking deep reinforcement learning for continuous control[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: JMLR, 2016: 1329-1338. (PMLR Vol. 48)
[23] CHEN P, PEI J, LU W, et al. A deep reinforcement learning based method for real-time path planning and dynamic obstacle avoidance[J]. Neurocomputing, 2022, 497: 64-75. DOI: 10.1016/j.neucom.2022.05.006
[24] 孙卉, 赵睿, 游亚璇, 等. 保障无人机安全通信的自主飞行3D路径规划[J]. 信号处理, 2022, 38(5): 1027-1036. SUN H, ZHAO R, YOU Y X, et al. Autonomous flight 3D path planning for ensuring secure communication in UAVs[J]. Journal of Signal Processing, 2022, 38(5): 1027-1036. DOI: 10.16798/j.issn.1003-0530.2022.05.015
[25] WANG J, SUN Y, LIU Z, et al. Reinforcement learning-based multi-strategy cuckoo search algorithm for 3D UAV path planning[J]. Expert Systems with Applications, 2023, 223: 119910. DOI: 10.1016/j.eswa.2023.119910
[26] 骆文冠, 于小兵. 基于强化学习布谷鸟搜索算法的应急无人机路径规划[J]. 灾害学, 2023, 38(2): 206-212. LUO W G, YU X B. Emergency UAV path planning based on reinforcement learning cuckoo search algorithm[J]. Journal of Catastrophology, 2023, 38(2): 206-212.
[27] LIU Y, WANG J, LI S. UAV autonomous navigation based on deep reinforcement learning in highly dynamic and high-density environments[J]. Drones, 2024, 8(9): 516. DOI: 10.3390/drones8090516
[28] CHEN S, MO Y, WU X, et al. Reinforcement learning-based energy-saving path planning for UAVs in turbulent wind[J]. Electronics, 2024, 13(16): 3190. DOI: 10.3390/electronics13163190
[29] 司鹏搏, 吴兵, 杨睿哲, 等. 基于DDPG三维无人机路径规划[J]. 高技术通讯, 2022, 32(10): 1049-1057. SI P B, WU B, YANG R Z, et al. 3D UAV path planning based on DDPG[J]. High Technology Letters, 2022, 32(10): 1049-1057. DOI: 10.3772/j.issn.1002-0470.2022.10.006
[30] ZAMMIT C, VAN KAMPEN E J. Real-time 3D UAV path planning in dynamic environments with uncertainty[J]. Unmanned Systems, 2023, 11(3): 203-219. DOI: 10.1142/S2301385023500073
[31] ZHANG L, PENG J, SONG X, et al. A state-decomposition DDPG algorithm for UAV autonomous navigation in 3-D complex environments[J]. IEEE Internet of Things Journal, 2024, 11(6): 9686-9699. DOI: 10.1109/JIOT.2023.3327753
[32] SOPEGNO L, CIRRINCIONE G, MARTINI S, et al. Transformer-Based Physics Informed Proximal Policy Optimization for UAV Autonomous Navigation[C]//2025 International Conference on Unmanned Aircraft Systems (ICUAS). 2025: 1094-1099. DOI:10.1109/ICUAS65942.2025.11007786.
[33] PUENTE-CASTRO A, RIVERO D, PAZOS A, et al. A review of artificial intelligence applied to path planning in UAV swarms[J]. Neural Computing and Applications, 2022, 34(1): 153-170. DOI: 10.1007/s00521-021-06569-4
[34] EL-BASIONI B M M, EL-KADER S M A. Mission-based PTR triangle for multi-UAV systems flight planning[J]. Ad Hoc Networks, 2023, 142: 103115. DOI: 10.1016/j.adhoc.2023.103115
[35] ZHANG Y, ZHAO W, WANG J, et al. Recent progress, challenges and future prospects of applied deep reinforcement learning: A practical perspective in path planning[J]. Neurocomputing, 2024, 608: 128423. DOI: 10.1016/j.neucom.2024.128423
[36] PHUNG M D, HA Q P. Safety-enhanced UAV path planning with spherical vector-based particle swarm optimization[J]. Applied Soft Computing, 2021, 107: 107376. DOI: 10.1016/j.asoc.2021.107376

Please choose a citation manager

Content to export