基于改进TD3算法的移动机器人路径规划

doi:10.19678/j.issn.1000-3428.0070256

摘要/Abstract

摘要：

传统的移动机器人路径规划算法通常需要在有地图的条件下才能有效规划路径。相比之下基于深度强化学习(DRL)的路径规划因其在无地图条件下的导航能力而备受关注。然而, 传统的DRL路径规划算法往往存在样本利用率低、训练速度慢、泛化能力不足等问题。针对上述问题, 对双延迟深度确定性(TD3)策略梯度算法进行改进以提高其在移动机器人路径规划中的性能。首先, 针对TD3算法持续探索空间能力有限的问题, 对其探索策略进行改进, 通过使用具有时间相关性的粉红噪声来增强算法的持续探索空间能力。其次, 结合n步方法和损失调整近似Actor优先(LA3P)经验回放方法, n步方法将经验回放池中的即时奖励扩展为n步的累计折扣奖励, 能够更准确地捕捉长期奖励信号, 而LA3P方法通过对n步经验的高效利用, 提高样本利用率和算法的性能。最后, 通过在Gazebo中搭建了3个不同的环境进行实验, 并和多种算法进行比较。实验结果表明, 改进算法在训练时间、平均成功率、平均距离等方面更具优势, 证明了改进算法的有效性。

关键词: 路径规划, 深度强化学习, 双延迟深度确定性策略梯度, 粉红噪声, 损失调整近似Actor优先经验回放

Abstract:

The traditional path-planning algorithm of a mobile robot usually requires a map to effectively plan the path. By contrast, path planning based on Deep Reinforcement Learning (DRL) does not require maps for navigation, owing to which it has received considerable attention. However, traditional DRL path-planning algorithms often face challenges such as low sample utilization, slow training speed, and insufficient generalization ability. To solve these issues, the Twin Delayed Deep Deterministic (TD3) policy gradient algorithm is improved to enhance its path-planning performance for mobile robots. First, to solve the problem of the TD3 algorithm having limited ability for continuous space exploration, the exploration strategy is improved and the continuous space exploration ability of the algorithm is enhanced using pink noise with time correlation. Second, the n-step method is combined with the Loss-Adjusted Approximate Actor Prioritized (LA3P) experience replay method. The n-step method can capture the long-term reward signal more accurately by expanding the immediate reward in the experience replay buffer to the n-step cumulative discount reward, whereas the LA3P method can improve the sample utilization and performance of the algorithm by efficiently using the n-step experience. Finally, three different environments are built in Gazebo for the experiments and compared with various algorithms. The experimental results show that the improved algorithm has more advantages in terms of training time, average success rate, and average distance, which proves the effectiveness of the improved algorithm.

Key words: path planning, Deep Reinforcement Learning (DRL), Twin Delayed Deep Deterministic (TD3) policy gradient, pink noise, Loss-Adjusted Approximate Actor Prioritized (LA3P) experience replay

李明明, 潘子豪. 基于改进TD3算法的移动机器人路径规划[J]. 计算机工程, 2026, 52(5): 150-159.

LI Mingming, PAN Zihao. Mobile Robot Path Planning Based on Improved TD3 Algorithm[J]. Computer Engineering, 2026, 52(5): 150-159.

https://www.ecice06.com/CN/Y2026/V52/I5/150

图/表 22

图1 不同噪声的采样图

Fig.1 Sampling graphs of different noises

图2 本文算法网络结构

Fig.2 Network architecture of proposed algorithm

图3 本文算法实验流程

Fig.3 Experimental procedure of proposed algorithm

图4 训练环境

Fig.4 Training environment

图5 TD3算法和TD3PN算法的奖励曲线

Fig.5 Reward curves for TD3 algorithm and TD3PN algorithm

图6 TD3算法和TD3PN算法的步数曲线

Fig.6 Steps curves for TD3 algorithm and TD3PN algorithm

图7 不同n值的奖励曲线

Fig.7 Reward curves for different n values

图8 不同n值的步数曲线

Fig.8 Steps curves for different n values

图9 不同算法的奖励曲线

Fig.9 Reward curves for different algorithms

图10 不同算法的步数曲线

Fig.10 Steps curves for different algorithms

图11 测试环境

Fig.11 Test environments

图12 各算法在环境a中的轨迹

Fig.12 The trajectory of each algorithm in environment a

图13 各算法在环境b中的轨迹

Fig.13 The trajectory of each algorithms in environment b

图14 各算法在环境c中的轨迹

Fig.14 The trajectory of each algorithms in environment c

参考文献 28

1	王洪斌, 尹鹏衡, 郑维, 等. 基于改进的A * 算法与动态窗口法的移动机器人路径规划. 机器人, 2020, 42 (3): 346- 353.
	WANG H B, YIN P H, ZHENG W, et al. Mobile robot path planning based on improved A * algorithm and dynamic window method. Robot, 2020, 42 (3): 346- 353.
2	薛光辉, 王梓杰, 王一凡, 等. 基于改进人工势场算法的煤矿井下机器人路径规划. 工矿自动化, 2024, 50 (5): 6- 13.
	XUE G H, WANG Z J, WANG Y F, et al. Path planning of coal mine underground robot based on improved artificial potential field algorithm. Journal of Mine Automation, 2024, 50 (5): 6- 13.
3	王潇洒, 刘丽星, 杨欣, 等. 改进遗传算法的果园割草机作业路径规划. 重庆理工大学学报(自然科学), 2024, 38 (6): 227- 233.
	WANG X S, LIU L X, YANG X, et al. Improved genetic algorithm for orchard lawn mower operation path planning. Journal of Chongqing University of Technology (Natural Science), 2024, 38 (6): 227- 233.
4	朱敏, 胡若海, 卞京. 基于改进蚁群算法的移动机器人路径规划. 现代制造工程, 2024, 52 (3): 38- 44.
	ZHU M, HU R H, BIAN J. Path planning for mobile robots based on improved ant colony algorithm. Modern Manufacturing Engineering, 2024, 52 (3): 38- 44.
5	HAMZA A. Deep reinforcement learning for mapless mobile robot navigation[D]. Luleå, Sweden: Luleå University of Technology, 2022.
6	JIANG H, WANG H, YUAN W Y, et al. A brief survey: deep reinforcement learning in mobile robot navigation[C]//Proceedings of the 15th IEEE Conference on Industrial Electronics and Applications. Washington D. C., USA: IEEE Press, 2020: 592-597.
7	刘红伟, 潘灵, 吴明钦, 等. 一种FPGA集群轻量级深度学习计算架构设计及实现. 电讯技术, 2024, 64 (1): 14- 21.
	LIU H W, PAN L, WU M Q, et al. Design and implementation of lightweight deep learning computing architecture for FPGA cluster. Telecommunication Engineering, 2024, 64 (1): 14- 21.
8	LILLICRAP T P. Continuous control with deep reinforcement learning[EB/OL]. [2024-07-10]. https://arxiv.org/abs/1509.02971v2.
9	SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//Proceedings of the 31st International Conference on Machine Learning. Washington D. C., USA: IEEE Press, 2014: 387-395.
10	FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: [s. n.], 2018: 1587-1596.
11	CIMURS R , SUH I H , LEE J H . Goal-driven autonomous exploration through deep reinforcement learning. IEEE Robotics and Automation Letters, 2021, 7 (2): 730- 737.
12	HU W , ZHOU Y , HO H W . Mobile robot navigation based on noisy n-step dueling double deep Q-network and prioritized experience replay. Electronics, 2024, 13 (12): 2423. doi: 10.3390/electronics13122423
13	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[C]//Proceedings of the 2016 International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2016: 322-355.
14	MARCHESINI E, FARINELLI A. Discrete deep reinforcement learning for mapless navigation[C]//Proceedings of the 2020 IEEE International Conference on Robotics and Automation. Washington D. C., USA: IEEE Press, 2020: 10688-10694.
15	SAGLAM B , MUTLU F B , CICEK D C , et al. Actor prioritized experience replay. Journal of Artificial Intelligence Research, 2023, 78, 639- 672. doi: 10.1613/jair.1.14819
16	ZHANG X , SHI X , ZHANG Z , et al. A DDQN path planning algorithm based on experience classification and multi steps for mobile robots. Electronics, 2022, 11 (14): 2120. doi: 10.3390/electronics11142120
17	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI Press, 2016: 2094-2100.
18	YANG J , NI J , LI Y , et al. The intelligent path planning system of agricultural robot via reinforcement learning. Sensors, 2022, 22 (12): 4316. doi: 10.3390/s22124316
19	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft Actor-Critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning. New, York, USA: ACM Press, 2018: 1861-1870.
20	CHOWDHURY M A, LU Q. A novel entropy-maximizing TD3-based reinforcement learning for automatic PID tuning[C]//Proceedings of the 2023 American Control Conference. Washington D. C., USA: IEEE Press, 2023: 2763-2768.
21	YIN Y , CHEN Z , LIU G , et al. A mapless local path planning approach using deep reinforcement learning framework. Sensors, 2023, 23 (4): 2036. doi: 10.3390/s23042036
22	LIU L , CHEN J , ZHANG Y , et al. Unmanned ground vehicle path planning based on improved DRL algorithm. Electronics, 2024, 13 (13): 2479. doi: 10.3390/electronics13132479
23	EBERHARD O, HOLLENSTEIN J, PINNERI C, et al. Pink noise is all you need: colored noise exploration in deep reinforcement learning[C]//Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2023: 457-466.
24	王涛, 张卫华, 蒲亦非. 适用于强化学习惯性环境的分数阶改进OU噪声. 四川大学学报(自然科学版), 2023, 60 (2): 57- 63.
	WANG T, ZHANG W H, PU Y F. An improved Ornstein-Uhlenbeck exploration noise based on fractional order calculus for reinforcement learning environments with momentum. Journal of Sichuan University (Natural Science Edition), 2023, 60 (2): 57- 63.
25	SUTTON R S , BARTO A G . Reinforcement learning: an introduction. Cambridge, USA: MIT Press, 2018.
26	HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: combining improvements in deep reinforcement learning[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2018: 332-343.
27	FUJIMOTO S, MEGER D, PRECUP D. An equivalence between loss functions and non-uniform sampling in experience replay[C]//Proceedings of the Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2020: 14219-14230.
28	KONDA V, TSITSIKLIS J. Actor-Critic algorithms[C]//Proceedings of the Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1999: 212-222.

[1]	李忠伟, 王鹏皓, 罗偲. 基于遥感数据的潮滩区域路径规划算法研究[J]. 计算机工程, 2026, 52(5): 418-429.
[2]	赵庶旭, 周宏泽, 王小龙. 基于改进DQN的最优联盟结构生成策略优化[J]. 计算机工程, 2026, 52(5): 117-128.
[3]	李斌, 郭毅. 面向异构多背包问题的深度强化学习算法[J]. 计算机工程, 2026, 52(4): 140-162.
[4]	刘义, 罗淳, 钟伟锋, 余意, 欧智清. 多无人机自适应合作任务卸载决策[J]. 计算机工程, 2026, 52(4): 339-348.
[5]	王兴杰, 王侃, 费蓉, 王怀军, 郭银波, 兰大鹏, 朱晓杰. 卫星边缘网络中基于扩散模型的算力分配策略[J]. 计算机工程, 2026, 52(1): 346-355.
[6]	崔萌萌, 施静燕, 项昊龙. 基于空地协同的动态车载边缘任务卸载方法[J]. 计算机工程, 2025, 51(9): 25-37.
[7]	秦敏浩, 孙未未. 基于隐状态预测的失真交通信号灯路口控制策略[J]. 计算机工程, 2025, 51(9): 1-13.
[8]	陈彦如, 刘珂良, 冉茂亮. 基于深度强化学习的外卖即时配送实时优化[J]. 计算机工程, 2025, 51(9): 328-339.
[9]	亓明凯, 王迪, 张立晔. 基于分层强化学习的在线三维装箱模型[J]. 计算机工程, 2025, 51(6): 136-145.
[10]	吴凯峰, 刘磊, 刘晨, 梁成庆. 基于融合课程思想MADDPG的无人机编队控制[J]. 计算机工程, 2025, 51(5): 73-82.
[11]	吕超峰, 徐鹏飞, 罗迪, 刘金平. 基于多智能体深度强化学习的SD-IoT控制器部署[J]. 计算机工程, 2025, 51(5): 83-92.
[12]	张博强, 陈新明, 冯天培, 吴兰, 刘宁宁, 孙朋. 基于混合A^*和修正RS曲线融合的路径规划[J]. 计算机工程, 2025, 51(4): 373-382.
[13]	周青, 潘凡安, 陈文冲, 江波. 面向工业互联网平台的物流服务选择性众包与路径规划研究[J]. 计算机工程, 2025, 51(4): 360-372.
[14]	林绍福, 陈盈盈, 李硕朋. 基于深度强化学习的多无人机能量传输与边缘计算联合优化方法[J]. 计算机工程, 2025, 51(3): 144-154.
[15]	李淑怡, 阳波, 陈灵, 沈玲, 唐文胜. 自适应奖励函数的PPO曲面覆盖方法[J]. 计算机工程, 2025, 51(3): 86-94.

选择文件类型/文献管理软件名称

选择包含的内容