基于自主探索的移动机器人路径规划研究

doi:10.19678/j.issn.1000-3428.0068764

摘要/Abstract

摘要：

移动机器人在路径规划过程中, 当面对未知且动态变化的环境时, 会存在与障碍物碰撞率高、易陷入局部最优等问题。针对这些问题, 提出一种基于双延迟深度确定性策略梯度(TD3)算法的改进算法TD3pro, 以提高移动机器人在未知动态环境下的路径规划性能。首先, 引入长短期记忆(LSTM)神经网络并与TD3算法相结合, 通过门结构筛选历史状态信息, 并感知探测范围内障碍物的状态变化, 帮助机器人更好地理解环境的动态变化和障碍物的移动模式, 使移动机器人能够准确预测和响应动态障碍物的行为, 从而降低与障碍物的碰撞率。其次, 加入OU (Ornstein-Uhlenbeck)探索噪声, 帮助移动机器人持续探索周围环境, 增强移动机器人的探索能力和随机性。在此基础上, 将单个经验池设置为成功、失败和临时3个经验池, 以此提高有效经验样本的采样效率, 进而减少训练时间。最后, 在2个不同的动、静态障碍物混合场景中进行路径规划实验仿真。实验结果表明: 场景1中该算法相较于深度确定性策略梯度(DDPG)算法以及TD3算法, 模型收敛的回合数减少了100~200个, 路径长度缩短了0.5~0.8, 规划时间减少了1~4 s; 场景2中该算法相较于TD3算法, 模型收敛的回合数减少了100~300个, 路径长度缩短了1~3, 规划时间减少了4~8 s, DDPG算法失败, 移动机器人无法成功抵达终点。由此可见, 改进的算法具有更好的路径规划性能。

关键词: 移动机器人, 路径规划, 双延迟深度确定性策略梯度算法, 长短期记忆神经网络, OU探索噪声

Abstract:

In path planning for mobile robots, challenges arise when dealing with unknown and dynamically changing environments, such as high collision rates with obstacles and susceptibility to local optima. To address these issues, this paper proposes an improved Twin Delayed Deep Deterministic (TD3) algorithm, based on TD3 policy gradient, to enhance the path-planning performance of mobile robots in unknown dynamic environments. First, a Long Short-Term Memory (LSTM) neural network is introduced and combined with the TD3 algorithm. Employing gate structures, historical state information is filtered to perceive the state changes of obstacles within the sensing range for the robot to gain a better understanding of the dynamic environment and movement patterns of obstacles. This enables the mobile robot to accurately predict and respond to the behavior of dynamic obstacles, thereby reducing the collision rate with obstacles. Second, Ornstein-Uhlenbeck(OU) exploration noise is incorporated to facilitate continuous exploration of the surrounding environment, thereby enhancing the robot's random exploration capability. Additionally, a single experience pool is divided into three separate pools-success, failure, and temporary-to improve the sampling efficiency of the effective samples and reduce training time. Finally, simulation experiments are conducted for two different scenarios involving a mixture of dynamic and static obstacles for path planning. A comparative analysis of the experimental results demonstrates that in scenario 1, the proposed algorithm reduces the convergence of the model by 100-200 rounds compared with the Deep Deterministic Policy Gradient (DDPG) and TD3 algorithms. Moreover, it shortens the path length by 0.5-0.8 units and reduces the planning time by 1-4 s. In scenario 2, the proposed algorithm reduces the convergence of the model by 100-300 rounds compared to the TD3 algorithm, shortening the path length by 1-3 units and reducing the planning time by 4-8 s. However, the DDPG algorithm fails as the mobile robot is unable to reach the destination successfully. Therefore, the improved algorithm exhibits superior path planning performance.

Key words: mobile robot, path planning, Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, Long Short-Term Memory (LSTM) neural network, Ornstein-Uhlenbeck (OU) exploration noise

陈浩, 陈珺, 刘飞. 基于自主探索的移动机器人路径规划研究[J]. 计算机工程, 2025, 51(1): 60-70.

CHEN Hao, CHEN Jun, LIU Fei. Research on Path Planning of Mobile Robots Based on Autonomous Exploration[J]. Computer Engineering, 2025, 51(1): 60-70.

https://www.ecice06.com/CN/Y2025/V51/I1/60

图/表 20

图1 移动机器人训练环境

Fig.1 Training environment for mobile robot

图2 移动机器人运动示意图

Fig.2 Schematic diagram of mobile robot motion

图3 移动机器人状态示意图

Fig.3 Schematic diagram of mobile robot state

图4 强化学习模型

Fig.4 Reinforcement learning model

图5 TD3pro神经网络结构

Fig.5 TD3pro neural network architecture

图6 基于TD3pro算法的移动机器人自主探索路径规划过程示意图

Fig.6 Schematic diagram of autonomous exploration path planning process for a mobile robot based on the TD3pro algorithm

图7 动、静态障碍物混合场景1

Fig.7 Mixed scenario 1 with dynamic and static obstacles

图8 场景1训练阶段平均奖励曲线

Fig.8 Average reward curve during training phase in scenario 1

图9 场景1移动机器人运动过程

Fig.9 Motion process of the mobile robot in scenario 1

图10 场景1测试阶段平均奖励曲线图

Fig.10 Average reward curve during testing phase in scenario 1

图11 场景1移动机器人规划路径

Fig.11 Path planning of the mobile robot in scenario 1

图12 场景1规划路径长度

Fig.12 Planned path length in scenario 1

图13 场景1规划路径时间

Fig.13 Path planning time in scenario 1

图14 动、静态障碍物混合场景2

Fig.14 Mixed scenario 2 with dynamic and static obstacles

图15 场景2训练阶段平均奖励曲线

Fig.15 Average reward curve during training phase in scenario 2

图16 场景2测试阶段平均奖励曲线图

Fig.16 Average reward curve during testing phase in scenario 2

图17 场景2移动机器人规划路径

Fig.17 Path planning of the mobile robot in scenario 2

参考文献 27

1	黄琰, 李岩, 俞建成, 等. AUV智能化现状与发展趋势. 机器人, 2020, 42 (2): 215- 231.
	HUANG D , LI Y , YU J C , et al. State-of-the-art and development trends of AUV intelligence. Robot, 2020, 42 (2): 215- 231.
2	ZHANG Y , LI L , LIN H C , et al. Development of path planning approach using improved A-star algorithm in AGV system. Journal of Internet Technology, 2019, 20 (3): 915- 924.
3	MASHAYEKHI R , IDRIS M Y I , ANISI M H , et al. Hybrid RRT: a semi-dual-tree RRT-based motion planner. IEEE Access, 2020, 8, 18658- 18668. doi: 10.1109/ACCESS.2020.2968471
4	YANG W L , WU P , ZHOU X Q , et al. Improved artificial potential field and dynamic window method for amphibious robot fish path planning. Applied Sciences, 2021, 11 (5): 2114. doi: 10.3390/app11052114
5	詹京吴, 黄宜庆. 融合安全A^*算法与动态窗口法的机器人路径规划. 计算机工程, 2022, 48 (9): 105-112, 120.
	ZHAN J W , HUANG Y Q . Path planning of robotcombing safety A^* algorithm and dynamic window approach. Computer Engineering, 2022, 48 (9): 105-112, 120.
6	KUMAR A , OJHA A . Experimental evaluation of certain pursuit and evasion schemes for wheeled mobile robots. International Journal of Automation and Computing, 2019, 16 (4): 491- 510. doi: 10.1007/s11633-018-1151-x
7	FERIANI A , HOSSAIN E . Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: a tutorial. IEEE Communications Surveys [WT《Times New Roman》] & Tutorials, 2021, 23 (2): 1226- 1252.
8	LUVIANO D , YU W . Continuous-time path planning for multi-agents with fuzzy reinforcement learning. Journal of Intelligent [WT《Times New Roman》] & Fuzzy Systems, 2017, 33 (1): 491- 501.
9	WATKINS C J C H , DAYAN P . Q-learning. Machine Learning, 1992, 8 (3): 279- 292.
10	李奇儒, 耿霞. 基于改进DQN算法的机器人路径规划. 计算机工程, 2023, 49 (12): 111- 120.
	LI Q R , GENG X . Robot path planning based on improved DQN algorithm. Computer Engineering, 2023, 49 (12): 111- 120.
11	KANG Y T , CHEN W J , ZHU D Q , et al. Collision avoidance path planning in multi-ship encounter situations. Journal of Marine Science and Technology, 2021, 26 (4): 1026- 1037.
12	FAN J, WANG Z, XIE Y, et al. A theoretical analysis of deep Q-learning[C]//Proceedings of Learning for Dynamics and Control Conference. [S. l. ]: PMLR, 2020: 486-489.
13	LILLICRAP T, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. [2023-05-10]. https://arxiv.org/pdf/1509.02971 .
14	KONDA V , TSITSIKLIS J . Actor-Critic algorithms. SIAM Journal on Control [WT《Times New Roman》] & Optimization, 1999, 42 (4): 1143- 1166.
15	HOU Y N, LIU L F, WEI Q, et al. A novel DDPG method with prioritized experience replay[C]//Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Washington D. C., USA: IEEE Press, 2017: 316-321.
16	TAI J J , PHANG S K , WONG F Y M . COAA*—an optimized obstacle avoidance and navigational algorithm for UAVs operating in partially observable 2D environments. Unmanned Systems, 2022, 10 (2): 159- 174.
17	FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in Actor-Critic methods[EB/OL]. [2023-05-10]. https://arxiv.org/pdf/1802.09477 .
18	YANG R W , XU J S , WANG X , et al. Parallel trajectory planning for shipborne autonomous collision avoidance system. Applied Ocean Research, 2019, 91, 101875.
19	HOCHREITER S , SCHMIDHUBER J . Long short-term memory. Neural Computation, 1997, 9 (8): 1735- 1780.
20	HEWAMALAGE H , BERGMEIR C , BANDARA K . Recurrent neural networks for time series forecasting: current status and future directions. International Journal of Forecasting, 2021, 37 (1): 388- 427.
21	LAKRETZ Y, KRUSZEWSKI G, DESBORDES T. The Emergence of Number and Syntax Units in LSTM Language Models[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long and Short Papers). Stroudsburg, USA: Association forcomputational linguistics, 2019: 331-434.
22	MARTIN D , O'BYRNE J , CATES M E , et al. Statistical mechanics of active Ornstein-Uhlenbeck particles. Physical Review E, 2021, 103, 032607.
23	PUTERMAN M L. Markov decision processes[M]//BARNHART C, LAPORTE G. Handbooks in operations research and management science. Amsterdam, The Kingdom of the Netherlands: Elsevier, 1990: 331-434.
24	SHERSTINSKY A . Fundamentals of Recurrent Neural Network(RNN) and Long Short-Term Memory(LSTM) network. Physica D: Nonlinear Phenomena, 2020, 404, 132306.
25	刘国名, 李彩虹, 李永迪, 等. 基于改进PPO算法的机器人局部路径规划. 计算机工程, 2023, 49 (2): 119-126, 135.
	LIU G M , LI C H , LI Y D , et al. Local path planning of robot based on improved PPO algorithm. Computer Engineering, 2023, 49 (2): 119-126, 135.
26	KERDPHOL T , WATANABE M , HONGESOMBUT K , et al. Self-adaptive virtual inertia control-based fuzzy logic to improve frequency stability of microgrid with high renewable penetration. IEEE Access, 2019, 7, 76071- 76083.
27	el HELOU M , SUSSTRUNK S . Blind universal Bayesian image denoising with Gaussian noise level learning. IEEE Transactions on Image Processing, 2020, 29, 4885- 4897.

[1]	张国胜, 李彩虹, 张耀玉, 周瑞红, 梁振英. 基于改进人工势场法的机器人局部路径规划[J]. 计算机工程, 2025, 51(1): 88-97.
[2]	王志特, 罗丽平, 廖义奎. 改进A^*算法融合改进动态窗口法的移动机器人路径规划[J]. 计算机工程, 2024, 50(8): 86-101.
[3]	王少桐, 况立群, 韩慧妍, 熊风光, 薛红新. 基于优势后见经验回放的强化学习导航方法[J]. 计算机工程, 2024, 50(1): 313-319.
[4]	刘国名, 李彩虹, 李永迪, 张国胜, 张耀玉, 高腾腾. 基于改进PPO算法的机器人局部路径规划[J]. 计算机工程, 2023, 49(2): 119-126,135.
[5]	岳荣康, 丁行, 江海, 龙吟. 基于互斥锁传播的多智能体路径规划算法[J]. 计算机工程, 2023, 49(12): 103-110.
[6]	李奇儒, 耿霞. 基于改进DQN算法的机器人路径规划[J]. 计算机工程, 2023, 49(12): 111-120.
[7]	詹京吴, 黄宜庆. 融合安全A^*算法与动态窗口法的机器人路径规划[J]. 计算机工程, 2022, 48(9): 105-112,120.
[8]	马华伟, 马凯, 郭君. 考虑多投递的带无人机车辆路径规划问题研究[J]. 计算机工程, 2022, 48(8): 299-305.
[9]	黄金瑶, 刘同来, 吴嘉鑫, 武继刚. 多周期家庭护理的路径规划与调度算法[J]. 计算机工程, 2022, 48(7): 292-299.
[10]	李冠达, 金兢, 王凡, 夏营威, 杨学志. 室内场景下应用拓扑结构的高效路径规划算法[J]. 计算机工程, 2022, 48(6): 95-106.
[11]	杨思明, 单征, 曹江, 郭佳郁, 高原, 郭洋, 王平, 王景, 王晓楠. 基于模型的强化学习在无人机路径规划中的应用[J]. 计算机工程, 2022, 48(12): 255-260,269.
[12]	高万博, 朱俊武, 章永龙, 章小卫. 基于选择交叉烟花算法的无人车路径规划[J]. 计算机工程, 2022, 48(11): 314-320.
[13]	黄壹凡, 胡立坤, 薛文超. 基于改进RRT-Connect算法的移动机器人路径规划[J]. 计算机工程, 2021, 47(8): 22-28.
[14]	姜亚光, 陈曦, 李建彬, 闫靖晨, 刘曙元, 李坤昌. 基于LSTM的S7协议模糊测试用例生成方法[J]. 计算机工程, 2021, 47(7): 183-188.
[15]	闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技术综述[J]. 计算机工程, 2021, 47(10): 16-25.

选择文件类型/文献管理软件名称

选择包含的内容