作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (1): 60-70. doi: 10.19678/j.issn.1000-3428.0068764

• 人工智能与模式识别 • 上一篇    下一篇

基于自主探索的移动机器人路径规划研究

陈浩, 陈珺*(), 刘飞   

  1. 江南大学轻工过程先进控制教育部重点实验室, 江苏 无锡 214122
  • 收稿日期:2023-11-03 出版日期:2025-01-15 发布日期:2025-01-18
  • 通讯作者: 陈珺
  • 基金资助:
    国家自然科学基金(62073154); 江苏省自然科学基金(BK20231036)

Research on Path Planning of Mobile Robots Based on Autonomous Exploration

CHEN Hao, CHEN Jun*(), LIU Fei   

  1. Key Laboratory of Advanced Control in Light Industry Processes, Ministry of Education, Jiangnan University, Wuxi 214122, Jiangsu, China
  • Received:2023-11-03 Online:2025-01-15 Published:2025-01-18
  • Contact: CHEN Jun

摘要:

移动机器人在路径规划过程中, 当面对未知且动态变化的环境时, 会存在与障碍物碰撞率高、易陷入局部最优等问题。针对这些问题, 提出一种基于双延迟深度确定性策略梯度(TD3)算法的改进算法TD3pro, 以提高移动机器人在未知动态环境下的路径规划性能。首先, 引入长短期记忆(LSTM)神经网络并与TD3算法相结合, 通过门结构筛选历史状态信息, 并感知探测范围内障碍物的状态变化, 帮助机器人更好地理解环境的动态变化和障碍物的移动模式, 使移动机器人能够准确预测和响应动态障碍物的行为, 从而降低与障碍物的碰撞率。其次, 加入OU (Ornstein-Uhlenbeck)探索噪声, 帮助移动机器人持续探索周围环境, 增强移动机器人的探索能力和随机性。在此基础上, 将单个经验池设置为成功、失败和临时3个经验池, 以此提高有效经验样本的采样效率, 进而减少训练时间。最后, 在2个不同的动、静态障碍物混合场景中进行路径规划实验仿真。实验结果表明: 场景1中该算法相较于深度确定性策略梯度(DDPG)算法以及TD3算法, 模型收敛的回合数减少了100~200个, 路径长度缩短了0.5~0.8, 规划时间减少了1~4 s; 场景2中该算法相较于TD3算法, 模型收敛的回合数减少了100~300个, 路径长度缩短了1~3, 规划时间减少了4~8 s, DDPG算法失败, 移动机器人无法成功抵达终点。由此可见, 改进的算法具有更好的路径规划性能。

关键词: 移动机器人, 路径规划, 双延迟深度确定性策略梯度算法, 长短期记忆神经网络, OU探索噪声

Abstract:

In path planning for mobile robots, challenges arise when dealing with unknown and dynamically changing environments, such as high collision rates with obstacles and susceptibility to local optima. To address these issues, this paper proposes an improved Twin Delayed Deep Deterministic (TD3) algorithm, based on TD3 policy gradient, to enhance the path-planning performance of mobile robots in unknown dynamic environments. First, a Long Short-Term Memory (LSTM) neural network is introduced and combined with the TD3 algorithm. Employing gate structures, historical state information is filtered to perceive the state changes of obstacles within the sensing range for the robot to gain a better understanding of the dynamic environment and movement patterns of obstacles. This enables the mobile robot to accurately predict and respond to the behavior of dynamic obstacles, thereby reducing the collision rate with obstacles. Second, Ornstein-Uhlenbeck(OU) exploration noise is incorporated to facilitate continuous exploration of the surrounding environment, thereby enhancing the robot's random exploration capability. Additionally, a single experience pool is divided into three separate pools-success, failure, and temporary-to improve the sampling efficiency of the effective samples and reduce training time. Finally, simulation experiments are conducted for two different scenarios involving a mixture of dynamic and static obstacles for path planning. A comparative analysis of the experimental results demonstrates that in scenario 1, the proposed algorithm reduces the convergence of the model by 100-200 rounds compared with the Deep Deterministic Policy Gradient (DDPG) and TD3 algorithms. Moreover, it shortens the path length by 0.5-0.8 units and reduces the planning time by 1-4 s. In scenario 2, the proposed algorithm reduces the convergence of the model by 100-300 rounds compared to the TD3 algorithm, shortening the path length by 1-3 units and reducing the planning time by 4-8 s. However, the DDPG algorithm fails as the mobile robot is unable to reach the destination successfully. Therefore, the improved algorithm exhibits superior path planning performance.

Key words: mobile robot, path planning, Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, Long Short-Term Memory (LSTM) neural network, Ornstein-Uhlenbeck (OU) exploration noise