作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (2): 119-126,135. doi: 10.19678/j.issn.1000-3428.0063304

• 人工智能与模式识别 • 上一篇    下一篇

基于改进PPO算法的机器人局部路径规划

刘国名, 李彩虹, 李永迪, 张国胜, 张耀玉, 高腾腾   

  1. 山东理工大学 计算机科学与技术学院, 山东 淄博 255000
  • 收稿日期:2021-11-22 修回日期:2022-02-14 发布日期:2022-03-21
  • 作者简介:刘国名(1995-),男,硕士研究生,主研方向为智能系统;李彩虹(通信作者),教授、博士;李永迪,硕士研究生;张国胜、张耀玉、高腾腾,硕士研究生。
  • 基金资助:
    国家自然科学基金面上项目(61473179,61973184)。

Local Path Planning of Robot Based on Improved PPO Algorithm

LIU Guoming, LI Caihong, LI Yongdi, ZHANG Guosheng, ZHANG Yaoyu, GAO Tengteng   

  1. School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, Shandong, China
  • Received:2021-11-22 Revised:2022-02-14 Published:2022-03-21

摘要: 利用强化学习训练机器人局部路径规划模型存在算法收敛速度慢、易陷入死锁区域导致目标不可达等问题。对传统近端策略优化(PPO)算法进行改进,引入长短期记忆(LSTM)神经网络并设计虚拟目标点法,提出LSTM-PPO算法。将PPO神经网络结构中的全连接层替换为LSTM记忆单元,控制样本信息的记忆和遗忘程度,优先学习奖励值高的样本,从而更快地累积奖励优化模型。在此基础上,加入虚拟目标点,通过雷达传感器收集的环境信息判断机器人陷入死锁区域时弃用目标点给予机器人的引导,使机器人走出陷阱区域并趋向目标点,减少在死锁区域不必要的训练。分别在特殊障碍物场景和混合障碍物场景中对LSTM-PPO算法进行仿真验证,结果表明,与传统PPO算法和改进算法SDAS-PPO相比,该算法在两种场景训练中均能最快到达奖励峰值,可加快模型收敛速度,减少冗余路段,优化路径平滑度并缩短路径长度。

关键词: 机器人, 局部路径规划, 长短期记忆神经网络, 近端策略优化算法, 虚拟目标点

Abstract: The traditional reinforcement learning algorithm has the problem of slow convergence and fails to reach the target owing to the possibility of falling into the deadlock area.Thus, based on the Proximal Policy Optimization(PPO) algorithm combined with a Long Short-Term Memory(LSTM) neural network and designed virtual target point method, this study introduces a LSTM-PPO algorithm.In this algorithm, the fully connected layer in the PPO neural network structure is replaced with an LSTM memory unit to control the memory and forgetting degree of sample information.The algorithm gives priority to learning samples with high rewards and accumulates the reward optimization model faster.A virtual target point is added and the robot's guidance from the goal point is deprecated when the robot falls into the deadlock area judged by the environmental information collected by the radar sensors.This guides the robot to get out of a trapped area, approach a target point, and reduce unnecessary training in deadlock areas.Finally, the LSTM-PPO algorithm is simulated and verified in discrete obstacle and special obstacle scenes, and it is compared with traditional PPO and SDAS-PPO algorithms in the average reward and path length.The verification results show that the designed LSTM-PPO algorithm can reach the reward peak faster in various scenarios of training, enable faster convergence, reduce redundant road sections, improve path smoothness, and shorten path length.

Key words: robot, local path planning, Long Short-Term Memory(LSTM) neural network, Proximal Policy Optimization(PPO) algorithm, virtual target point

中图分类号: