基于强化学习的路径规划技术综述

doi:10.19678/j.issn.1000-3428.0060683

摘要/Abstract

摘要： 路径规划作为移动机器人自主导航的关键技术，主要是使目标对象在规定范围内找到一条从起点到终点的无碰撞安全路径。阐述基于常规方法和强化学习方法的路径规划技术，将强化学习方法主要分为基于值和基于策略两类，对比时序差分、Q-Learning等基于值的代表方法与策略梯度、模仿学习等基于策略的代表方法，并分析其融合策略和深度强化学习方法方法的发展现状。在此基础上，总结各种强化学习方法的优缺点及适用场合，同时对基于强化学习的路径规划技术的未来发展方向进行展望。

关键词: 路径规划, 强化学习, 深度强化学习, 移动机器人, 自主导航

Abstract: Path planning is one of the key technologies for autonomous navigation of mobile robots.It aims at planning a collision free optimal path from the current position to the destination in real time.This paper introduces the path planning techniques that are based on Reinforcement Learning(RL) and common methods, and categorizes the methods based on RL into two types:the value-based methods and the strategy-based methods.Then the paper compares value-based representation methods(including Timing Difference(TD), Q-Learning, etc.) and the strategy-based representation methods(including Strategy Gradient(SG) and Imitation Learning(IL), etc.), and analyzes the development status of its fusion strategy and Deep Reinforcement Learning(DRL).On this basis, the paper summarizes the advantages, disadvantages and application scenarios of the RL-based methods.Finally, the future development trends of the path planning techniques based on RL are discussed.

Key words: path planning, Reinforcement Learning(RL), Deep Reinforcement Learning(DRL), mobile robot, autonomous navigation

中图分类号:

TP242

闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技术综述[J]. 计算机工程, 2021, 47(10): 16-25.

YAN Jiaojie, ZHANG Qieshi, HU Xiping. Review of Path Planning Techniques Based on Reinforcement Learning[J]. Computer Engineering, 2021, 47(10): 16-25.

https://www.ecice06.com/CN/Y2021/V47/I10/16

图/表 2

参考文献

[1] 戴博, 肖晓明, 蔡自兴.移动机器人路径规划技术的研究现状与展望[J].控制工程, 2005, 12(3):198-202. DAI B, XIAO X M, CAI Z X.Current status and future development of mobile robot path planning technology[J].Control Engineering of China, 2005, 12(3):198-202.(in Chinese)
[2] RAJA P.Optimal path planning of mobile robots:a review[J].International Journal of Physical Sciences, 2012, 7(9):1314-1320.
[3] 王春颖, 刘平, 秦洪政.移动机器人的智能路径规划算法综述[J].传感器与微系统, 2018, 37(8):5-8. WANG C Y, LIU P, QIN H Z.Review on intelligent path planning algorithm of mobile robots[J].Transducer and Microsystem Technologies, 2018, 37(8):5-8.(in Chinese)
[4] 张广林, 胡小梅, 柴剑飞, 等.路径规划算法及其应用综述[J].现代机械, 2011(5):85-90. ZHANG G L, HU X M, CHAI J F, et al.Summary of path planning algorithm and its application[J].Modern Machinery, 2011(5):85-90.(in Chinese)
[5] TAVARES R S, MARTINS T C, TSUZUKI M S G.Simulated annealing with adaptive neighborhood:a case study in off-line robot path planning[J].Expert Systems with Applications, 2011, 38(4):2951-2965.
[6] LIU Y C, ZHAO Y J.A virtual-waypoint based artificial potential field method for UAV path planning[C]//Proceedings of 2016 IEEE Chinese Guidance, Navigation and Control Conference.Washington D.C., USA:IEEE Press, 2016:949-953.
[7] GARCIA M A P, MONTIEL O, CASTILLO O, et al.Optimal path planning for autonomous mobile robot navigation using ant colony optimization and a fuzzy cost function evaluation[J].Applied Soft Computing, 2009, 9(3):1102-1110.
[8] 周滔, 赵津, 胡秋霞, 等.复杂环境下移动机器人全局路径规划与跟踪[J].计算机工程, 2018, 44(12):208-214. ZHOU T, ZHAO J, HU Q X, et al.Global path planning and tracking for mobile robot in cluttered environment[J].Computer Engineering, 2018, 44(12):208-214.(in Chinese)
[9] LEE T K, BAEK S H, CHOI Y H, et al.Smooth coverage path planning and control of mobile robots based on high-resolution grid map representation[J].Robotics and Autonomous Systems, 2011, 59(10):801-812.
[10] 刘传领.基于势场法和遗传算法的机器人路径规划技术研究[D].南京:南京理工大学, 2012. LIU C L.Researches on technologies for robot path planning based on artificial potential field and genetic algorithm[D].Nanjing:Nanjing University of Science and Technology, 2012.(in Chinese)
[11] ZHU A, YANG S X.A neural network approach to dynamic task assignment of multirobots[J].IEEE Transactions on Neural Networks, 2006, 17(5):1278-1287.
[12] RASHID R, PERUMAL N, ELAMVAZUTHI I, et al.Mobile robot path planning using ant colony optimization[C]//Proceedings of the 2nd IEEE International Symposium on Robotics and Manufacturing Automation.Washington D.C., USA:IEEE Press, 2016:1-6.
[13] 胡章芳, 孙林, 张毅, 等.一种基于改进QPSO的机器人路径规划算法[J].计算机工程, 2019, 45(4):281-287. HU Z F, SUN L, ZHANG Y, et al.A robot path planning algorithm based on improved QPSO[J].Computer Engineering, 2019, 45(4):281-287.(in Chinese)
[14] SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning, 1988, 3(1):9-44.
[15] 赵冬斌, 邵坤, 朱圆恒, 等.深度强化学习综述:兼论计算机围棋的发展[J].控制理论与应用, 2016, 33(6):701-717. ZHAO D B, SHAO K, ZHU Y H, et al.Review of deep reinforcement learning and discussions on the development of computer go[J].Control Theory & Applications, 2016, 33(6):701-717.(in Chinese)
[16] BELLMAN R.Dynamic programming and lagrange multipliers[J].Proceedings of the National Academy of Sciences, 1956, 42(10):767-769.
[17] WERBOS P J.Advanced forecasting methods for global crisis warning and models of intelligence[J].General Systems Yearbook, 1977, 22(12):25-38.
[18] WATKINS C J C H, DAYAN P.Q-learning[J].Machine Learning, 1992, 8(3/4):279-292.
[19] RUMMERY G A, NIRANJAN M.On-line q-learning using connectionist systems[M].Cambridge, UK:University of Cambridge, 1994.
[20] BERTSEKAS D P, TSITSIKLIS J N.Neuro-dynamic programming:an overview[C]//Proceedings of the 34th IEEE Conference on Decision and Control.Washington D.C., USA:IEEE Press, 1995:560-564.
[21] KOCSIS L, SZEPESVARI C.Bandit based Monte-Carlo planning[C]//Proceedings of 2016 European Conference on Machine Learning.Berlin, Germany:Springer, 2006:282-293.
[22] LEWIS F L, VRABIE D.Reinforcement learning and adaptive dynamic programming for feedback control[J].IEEE Circuits and Systems Magazine, 2009, 9(3):32-50.
[23] SILVER D, LEVER G, HEESS N, et al.Deterministic policy gradient algorithms[C]//Proceedings of 2014 International Conference on Machine Learning.Washington D.C., USA:IEEE Press, 2014:387-395.
[24] MNIH V, BADIA A P, MIRZA M, et al.Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning.Washington D.C., USA:IEEE Press, 2016:1928-1937.
[25] ROUGIER J.Comment on "ensemble averaging and the curse of dimensionality"[J].Journal of Climate, 2018, 31(21):9015-9016.
[26] SUTTON R S.Generalization in reinforcement learning:successful examples using sparse coarse coding[C]//Proceedings of 1996 International Conference Neural Information Processing Systems.Cambridge, USA:MIT Press, 1996:1038-1044.
[27] NAIR D S, SUPRIYA P.Comparison of temporal difference learning algorithm and Dijkstra's algorithm for robotic path planning[C]//Proceedings of the 2nd International Conference on Intelligent Computing and Control Systems.Washington D.C., USA:IEEE Press, 2018:1619-1624.
[28] MARTIN J, WANG J K, ENGLOT B.Sparse Gaussian process temporal difference learning for marine robot navigation[EB/OL].[2020-12-11].https://arxiv.org/abs/1810.01217.
[29] LI S D, XU X, ZUO L.Dynamic path planning of a mobile robot with improved Q-learning algorithm[C]//Proceedings of 2015 IEEE International Conference on Information and Automation.Washington D.C., USA:IEEE Press, 2015:409-414.
[30] 刘智斌, 曾晓勤, 刘惠义, 等.基于BP神经网络的双层启发式强化学习方法[J].计算机研究与发展, 2015, 52(3):579-587. LIU Z B, ZENG X Q, LIU H Y, et al.A heuristic two-layer reinforcement learning algorithm based on BP neural networks[J].Journal of Computer Research and Development, 2015, 52(3):579-587.(in Chinese)
[31] JIANG L, HUANG H Y, DING Z H.Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge[J].IEEE/CAA Journal of Automatica Sinica, 2020, 7(4):1179-1189.
[32] 周文吉, 俞扬.分层强化学习综述[J].智能系统学报, 2017, 12(5):590-594. ZHOU W J, YU Y.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems, 2017, 12(5):590-594.(in Chinese)
[33] BUITRAGO-MARTINEZ A, DE LA ROSA R F, LOZANO-MARTINEZ F.Hierarchical reinforcement learning approach for motion planning in mobile robotics[C]//Proceedings of 2013 Latin American Robotics Symposium and Competition.Washington D.C., USA:IEEE Press, 2013:83-88.
[34] 刘志荣, 姜树海, 袁雯雯, 等.基于深度Q学习的移动机器人路径规划[J].测控技术, 2019, 38(7):24-28. LIU Z R, JIANG S H, YUAN W W, et al.Robot path planning based on deep Q-learning[J].Measurement & Control Technology, 2019, 38(7):24-28.(in Chinese)
[35] 裴道武.关于模糊逻辑与模糊推理逻辑基础问题的十年研究综述[J].工程数学学报, 2004, 21(2):249-258. PEI D W.A survey of ten years' studies on fuzzy logic and fuzzy reasoning[J].Chinese Journal of Engineering Mathematics, 2004, 21(2):249-258.(in Chinese)
[36] LUVIANO D, YU W.Continuous-time path planning for multi-agents with fuzzy reinforcement learning[J].Journal of Intelligent & Fuzzy Systems, 2017, 33(1):491-501.
[37] BOWLING M, VELOSO M.Multiagent learning using a variable learning rate[J].Artificial Intelligence, 2002, 136(2):215-250.
[38] WEN S H, CHEN J H, LI Z, et al.Fuzzy Q-learning obstacle avoidance algorithm of humanoid robot in unknown environment[C]//Proceedings of 2018 Chinese Control Conference.Washington D.C., USA:IEEE Press, 2018:5186-5190.
[39] 葛媛, 布朋生, 刘强.模糊强化学习在机器人导航中的应用[J].信息技术, 2009, 33(10):127-130. GE Y, BU P S, LIU Q.Application of fuzzy Q-learning in robot navigation[J].Information Technology, 2009, 33(10):127-130.(in Chinese)
[40] 朴松昊, 洪炳熔.一种动态环境下移动机器人的路径规划方法[J].机器人, 2003, 25(1):18-21, 43. PIAO S H, HONG B R.A path planning approach to mobile robot under dynamic environment[J].Robot, 2003, 25(1):18-21, 43.(in Chinese)
[41] MEERZA S I A, ISLAM M, UZZAL M M.Q-learning based particle swarm optimization algorithm for optimal path planning of swarm of mobile robots[C]//Proceedings of 2019 International Conference on Advances in Science, Engineering and Robotics Technology.Washington D.C., USA:IEEE Press, 2019:1-5.
[42] SHI Z G, TU J, ZHANG Q, et al.The improved Q-Learning algorithm based on pheromone mechanism for swarm robot system[C]//Proceedings of the 32nd Chinese Control Conference.Washington D.C., USA:IEEE Press, 2013:6033-6038.
[43] YAO Q F, ZHENG Z Y, QI L, et al.Path planning method with improved artificial potential field-a reinforcement learning perspective[J].IEEE Access, 2020, 8:135513-135523.
[44] LIU Z Y, LAN F, YANG H B.Partition heuristic RRT algorithm of path planning based on Q-learning[C]//Proceedings of 2019 Advanced Information Technology, Electronic and Automation Control Conference.Washington D.C., USA:IEEE Press, 2019:386-392.
[45] 王子强, 武继刚.基于RDC-Q学习算法的移动机器人路径规划[J].计算机工程, 2014, 40(6):211-214. WANG Z Q, WU J G.Mobile robot path planning based on RDC-Q learning algorithm[J].Computer Engineering, 2014, 40(6):211-214.(in Chinese)
[46] ZOU Q J, ZHANG Y, LIU S H.A path planning algorithm based on RRT and SARSA(λ) in unknown and complex conditions[C]//Proceedings of 2020 Chinese Control and Decision Conference.Washington D.C., USA:IEEE Press, 2020:2035-2040.
[47] XU D, FANG Y C, ZHANG Z Y, et al.Path planning method combining depth learning and sarsa algorithm[C]//Proceedings of the 10th International Symposium on Computational Intelligence and Design.Washington D.C., USA:IEEE Press, 2017:77-82.
[48] FATHINEZHAD F, DERHAMI V, REZAEIAN M.Supervised fuzzy reinforcement learning for robot navigation[J].Applied Soft Computing, 2016, 40:33-41.
[49] DABOONI S, WUNSCH D.Heuristic dynamic programming for mobile robot path planning based on Dyna approach[C]//Proceedings of 2016 International Joint Conference on Neural Networks.Washington D.C., USA:IEEE Press, 2016:3723-3730.
[50] VIET H H, AN S H, CHUNG T C.Dyna-Q-based vector direction for path planning problem of autonomous mobile robots in unknown environments[J].Advanced Robotics, 2013, 27(3):159-173.
[51] HWANG K S, JIANG W C, CHEN Y J.Adaptive model learning method for reinforcement learning[C]//Proceedings of SICE'12.Washington D.C., USA:IEEE Press, 2012:1277-1280.
[52] 刘建伟, 高峰, 罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报, 2019, 42(6):1406-1438. LIU J W, GAO F, LUO X L.Survey of deep reinforcement learning based on value function and policy gradient[J].Chinese Journal of Computers, 2019, 42(6):1406-1438.(in Chinese)
[53] WANG Q Z, XU D, SHI L Y.A review on robot learning and controlling:imitation learning and human-computer interaction[C]//Proceedings of the 2013 Chinese Control and Decision Conference.Washington D.C., USA:IEEE Press, 2013:2834-2838.
[54] LIU Y D, ZHANG W Z, CHEN F M, et al.Path planning based on improved deep deterministic policy gradient algorithm[C]//Proceedings of the 3rd Information Technology, Networking, Electronic and Automation Control Conference.Washington D.C., USA:IEEE Press, 2019:295-299.
[55] PAUL S, VIG L.Deterministic policy gradient based robotic path planning with continuous action spaces[C]//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops.Washington D.C., USA:IEEE Press, 2017:725-733.
[56] ZHENG S F, LIU H.Improved multi-agent deep deterministic policy gradient for path planning-based crowd simulation[J].IEEE Access, 2019, 7:147755-147770.
[57] PFEIFFER M, SHUKLA S, TURCHETTA M, et al.Reinforced imitation:sample efficient deep reinforcement learning for Mapless navigation by leveraging prior demonstrations[J].IEEE Robotics and Automation Letters, 2018, 3(4):4423-4430.
[58] HUSSEIN A, ELYAN E, GABER M M, et al.Deep imitation learning for 3D navigation tasks[J].Neural Computing and Applications, 2018, 29(7):389-404.
[59] XU J H, LIU Q W, GUO H, et al.Shared multi-task imitation learning for indoor self-navigation[C]//Proceedings of 2018 IEEE Global Communications Conference.Washington D.C., USA:IEEE Press, 2018:1-7.
[60] GRONDMAN I, BUSONIU L, LOPES G A D, et al.A survey of Actor-Critic reinforcement learning:standard and natural policy gradients[J].IEEE Transactions on Systems, Man, and Cybernetics, 2012, 42(6):1291-1307.
[61] MUSE D, WERMTER S.Actor-Critic learning for platform-independent robot navigation[J].Cognitive Computation, 2009, 1(3):203-220.
[62] LACHEKHAB F, TADJINE M.Goal seeking of mobile robot using fuzzy actor critic learning algorithm[C]//Proceedings of the 7th International Conference on Modelling, Identification and Control.Washington D.C., USA:IEEE Press, 2015:1-6.
[63] SHAO K, ZHAO D B, ZHU Y H, et al.Visual navigation with Actor-Critic deep reinforcement learning[C]//Proceedings of 2018 International Joint Conference on Neural Networks.Washington D.C., USA:IEEE Press, 2018:1-6.
[64] 刘全, 翟建伟, 章宗长, 等.深度强化学习综述[J].计算机学报, 2018, 41(1):1-27. LIU Q, ZHAI J W, ZHANG Z Z, et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers, 2018, 41(1):1-27.(in Chinese)
[65] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Playing Atari with deep reinforcement learning[EB/OL].[2020-12-11].https://arxiv.org/abs/1312.5602v1.
[66] TAI L, PAOLO G, LIU M.Virtual-to-real deep reinforcement learning:continuous control of mobile robots for mapless navigation[C]//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2017:31-36.
[67] 王珂, 卜祥津, 李瑞峰, 等.景深约束下的深度强化学习机器人路径规划[J].华中科技大学学报(自然科学版), 2018, 46(12):77-82. WANG K, BU X J, LI R F, et al.Path planning for robots based on deep reinforcement learning by depth constraint[J].Journal of Huazhong University of Science and Technology(Natural Science Edition), 2018, 46(12):77-82.(in Chinese)
[68] 李辉, 祁宇明.一种复杂环境下基于深度强化学习的机器人路径规划方法[J].计算机应用研究, 2020, 37(S1):129-131. LI H, QI Y M.Robot path planning method based on deep reinforcement learning in complex environment[J].Application Research of Computers, 2020, 37(S1):129-131.(in Chinese)
[69] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J].Nature, 2015, 518(7540):529-533.
[70] GU S, LILLICRAP T, SUTSKEVER I, et al.Continuous deep Q learning with model-based acceleration[EB/OL].[2020-12-11].https://www.cnblogs.com/wangxiaocvpr/p/5664795.html.

选择文件类型/文献管理软件名称

选择包含的内容