Abstract: The traditional reinforcement learning algorithm has the problem of slow convergence and fails to reach the target owing to the possibility of falling into the deadlock area.Thus, based on the Proximal Policy Optimization(PPO) algorithm combined with a Long Short-Term Memory(LSTM) neural network and designed virtual target point method, this study introduces a LSTM-PPO algorithm.In this algorithm, the fully connected layer in the PPO neural network structure is replaced with an LSTM memory unit to control the memory and forgetting degree of sample information.The algorithm gives priority to learning samples with high rewards and accumulates the reward optimization model faster.A virtual target point is added and the robot's guidance from the goal point is deprecated when the robot falls into the deadlock area judged by the environmental information collected by the radar sensors.This guides the robot to get out of a trapped area, approach a target point, and reduce unnecessary training in deadlock areas.Finally, the LSTM-PPO algorithm is simulated and verified in discrete obstacle and special obstacle scenes, and it is compared with traditional PPO and SDAS-PPO algorithms in the average reward and path length.The verification results show that the designed LSTM-PPO algorithm can reach the reward peak faster in various scenarios of training, enable faster convergence, reduce redundant road sections, improve path smoothness, and shorten path length.
local path planning,
Long Short-Term Memory(LSTM) neural network,
Proximal Policy Optimization(PPO) algorithm,
virtual target point