基于改进PPO算法的机器人局部路径规划

doi:10.19678/j.issn.1000-3428.0063304

摘要/Abstract

摘要： 利用强化学习训练机器人局部路径规划模型存在算法收敛速度慢、易陷入死锁区域导致目标不可达等问题。对传统近端策略优化（PPO）算法进行改进，引入长短期记忆（LSTM）神经网络并设计虚拟目标点法，提出LSTM-PPO算法。将PPO神经网络结构中的全连接层替换为LSTM记忆单元，控制样本信息的记忆和遗忘程度，优先学习奖励值高的样本，从而更快地累积奖励优化模型。在此基础上，加入虚拟目标点，通过雷达传感器收集的环境信息判断机器人陷入死锁区域时弃用目标点给予机器人的引导，使机器人走出陷阱区域并趋向目标点，减少在死锁区域不必要的训练。分别在特殊障碍物场景和混合障碍物场景中对LSTM-PPO算法进行仿真验证，结果表明，与传统PPO算法和改进算法SDAS-PPO相比，该算法在两种场景训练中均能最快到达奖励峰值，可加快模型收敛速度，减少冗余路段，优化路径平滑度并缩短路径长度。

关键词: 机器人, 局部路径规划, 长短期记忆神经网络, 近端策略优化算法, 虚拟目标点

Abstract: The traditional reinforcement learning algorithm has the problem of slow convergence and fails to reach the target owing to the possibility of falling into the deadlock area.Thus, based on the Proximal Policy Optimization(PPO) algorithm combined with a Long Short-Term Memory(LSTM) neural network and designed virtual target point method, this study introduces a LSTM-PPO algorithm.In this algorithm, the fully connected layer in the PPO neural network structure is replaced with an LSTM memory unit to control the memory and forgetting degree of sample information.The algorithm gives priority to learning samples with high rewards and accumulates the reward optimization model faster.A virtual target point is added and the robot's guidance from the goal point is deprecated when the robot falls into the deadlock area judged by the environmental information collected by the radar sensors.This guides the robot to get out of a trapped area, approach a target point, and reduce unnecessary training in deadlock areas.Finally, the LSTM-PPO algorithm is simulated and verified in discrete obstacle and special obstacle scenes, and it is compared with traditional PPO and SDAS-PPO algorithms in the average reward and path length.The verification results show that the designed LSTM-PPO algorithm can reach the reward peak faster in various scenarios of training, enable faster convergence, reduce redundant road sections, improve path smoothness, and shorten path length.

Key words: robot, local path planning, Long Short-Term Memory(LSTM) neural network, Proximal Policy Optimization(PPO) algorithm, virtual target point

中图分类号:

TP273

刘国名, 李彩虹, 李永迪, 张国胜, 张耀玉, 高腾腾. 基于改进PPO算法的机器人局部路径规划[J]. 计算机工程, 2023, 49(2): 119-126,135.

LIU Guoming, LI Caihong, LI Yongdi, ZHANG Guosheng, ZHANG Yaoyu, GAO Tengteng. Local Path Planning of Robot Based on Improved PPO Algorithm[J]. Computer Engineering, 2023, 49(2): 119-126,135.

https://www.ecice06.com/CN/Y2023/V49/I2/119

图/表 19

20230216180735

20230216180747

20230216180752

20230216180755

20230216180758

20230216180801

20230216180805

20230216180808

20230216180812

20230216180816

20230216180819

20230216180823

20230216180826

20230216180830

20230216180833

20230216180837

20230216180841

20230216180845

20230216180849

参考文献

[1] 于振中, 李强, 樊启高.智能仿生算法在移动机器人路径规划优化中的应用综述[J].计算机应用研究, 2019, 36(11):3210-3219. YU Z Z, LI Q, FAN Q G.Survey on application of bioinspired intelligent algorithms in path planning optimization of mobile robots[J].Application Research of Computers, 2019, 36(11):3210-3219.(in Chinese)
[2] 潘昕, 吴旭升, 侯新国, 等.基于遗传蚂蚁混合算法的AUV全局路径规划[J].华中科技大学学报(自然科学版), 2017, 45(5):45-49, 76. PAN X, WU X S, HOU X G, et al.Global path planning based on genetic-ant hybrid algorithm for AUV[J].Journal of Huazhong University of Science and Technology(Natural Science Edition), 2017, 45(5):45-49, 76.(in Chinese)
[3] 宋晓琳, 周南, 黄正瑜, 等.改进RRT在汽车避障局部路径规划中的应用[J].湖南大学学报(自然科学版), 2017, 44(4):30-37. SONG X L, ZHOU N, HUANG Z Y, et al.An improved RRT algorithm of local path planning for vehicle collision avoidance[J].Journal of Hunan University(Natural Sciences), 2017, 44(4):30-37.(in Chinese)
[4] CHEN Y B, LUO G C, MEI Y S, et al.UAV path planning using artificial potential field method updated by optimal control theory[J].International Journal of Systems Science, 2016, 47(6):1407-1420.
[5] YEN C T, CHENG M F.A study of fuzzy control with ant colony algorithm used in mobile robot for shortest path planning and obstacle avoidance[J].Microsystem Technologies, 2018, 24(1):125-135.
[6] PANOV A I, YAKOVLEV K S, SUVOROV R.Grid path planning with deep reinforcement learning:preliminary results[J].Procedia Computer Science, 2018, 123:347-353.
[7] MACEK K, PETROVIC I, PERIC N.A reinforcement learning approach to obstacle avoidance of mobile robots[C]//Proceedings of the 7th International Workshop on Advanced Motion Control.Washington D.C., USA:IEEE Press, 2002:462-466.
[8] ZHANG Q, LI M, WANG X S, et al.Reinforcement learning in robot path optimization[J].Journal of Software, 2012, 7(3):657-662.
[9] 刘智斌, 曾晓勤, 刘惠义, 等.基于BP神经网络的双层启发式强化学习方法[J].计算机研究与发展, 2015, 52(3):579-587. LIU Z B, ZENG X Q, LIU H Y, et al.A heuristic two-layer reinforcement learning algorithm based on BP neural networks[J].Journal of Computer Research and Development, 2015, 52(3):579-587.(in Chinese)
[10] 刘全, 翟建伟, 章宗长, 等.深度强化学习综述[J].计算机学报, 2018, 41(1):1-27. LIU Q, ZHAI J W, ZHANG Z C, et al.Overview of deep reinforcement learning[J].Chinese Journal of Computers, 2018, 41(1):1-27.(in Chinese)
[11] 刘建伟, 高峰, 罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报, 2019, 42(6):1406-1438. LIU J, GAO F, LUO X.Survey of deep reinforcement learning based on value function and policy gradient[J].Chinese Journal of Computers, 2019, 42(6):1406-1438.(in Chinese)
[12] WATKINS C J C H, DAYAN P.Q-learning[J].Machine Learning, 1992, 8(3/4):279-292.
[13] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Playing atari with deep reinforcement learning[EB/OL].[2021-08-20].https://arxiv.org/abs/1312.5602.
[14] LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[EB/OL].[2021-08-20].https://arxiv.org/abs/1509.02971.
[15] WILLIAMS R J.Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning, 1992, 8(3/4):229-256.
[16] SCHULMAN J, LEVINE S, MORITZ P, et al.Trust region policy optimization[EB/OL].[2021-08-20].https://arxiv.org/abs/1502.05477.
[17] SCHULMAN J, WOLSKI F, DHARIWAL P, et al.Proximal policy optimization algorithms[EB/OL].[2021-08-20].https://arxiv.org/abs/1707.06347.
[18] HÄMÄLÄINEN P, BABADI A, MA X X, et al.PPO-CMA:proximal policy optimization with covariance matrix adaptation[C]//Proceedings of the 30th International Workshop on Machine Learning for Signal Processing.Washington D.C., USA:IEEE Press, 2020:1-6.
[19] 申怡, 刘全.基于自指导动作选择的近端策略优化算法[J].计算机科学, 2021, 48(12):297-303. SHEN Y, LIU Q.Proximal policy optimization based on self-directed action selection[J].Computer Science, 2021, 48(12):297-303.(in Chinese)
[20] GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al.LSTM:a search space odyssey[J].IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(10):2222-2232.
[21] ZHAO L, ROH M I, LEE S J.Control method for path following and collision avoidance of autonomous ship based on deep reinforcement learning[J].Journal of Marine Science and Technology, 2019, 27(4):293-310.
[22] 王牛, 李祖枢, 李永龙, 等.带驱动直流电机两轮机器人运动系统仿真[J].系统仿真学报, 2008, 20(17):4633-4638, 4646. WANG N, LI Z S, LI Y L, et al.Motion system simulation of two wheeled robot with DC motor drive system[J].Journal of System Simulation, 2008, 20(17):4633-4638, 4646.(in Chinese)
[23] 高艺, 马国庆, 于正林, 等.一种六自由度工业机器人运动学分析及三维可视化仿真[J].中国机械工程, 2016, 27(13):1726-1731. GAO Y, MA G Q, YU Z L, et al.Kinematics analysis of an 6-DOF industrial robot and its 3D visualization simulation[J].China Mechanical Engineering, 2016, 27(13):1726-1731.(in Chinese)
[24] 杨惟轶, 白辰甲, 蔡超, 等.深度强化学习中稀疏奖励问题研究综述[J].计算机科学, 2020, 47(3):182-191. YANG W Y, BAI C J, CAI C, et al.Survey on sparse reward in deep reinforcement learning[J].Computer Science, 2020, 47(3):182-191.(in Chinese)
[25] GAO J, YE W, GUO J, et al.Deep reinforcement learning for indoor mobile robot path planning[J].Sensors(Basel, Switzerland), 2020, 20(19):E5493.
[26] SHERSTINSKY A.Fundamentals of Recurrent Neural Network(RNN) and Long Short-Term Memory(LSTM) network[J].Physica D:Nonlinear Phenomena, 2020, 404:132306.

选择文件类型/文献管理软件名称

选择包含的内容