[1] 戴博, 肖晓明, 蔡自兴.移动机器人路径规划技术的研究现状与展望[J].控制工程, 2005, 12(3):198-202. DAI B, XIAO X M, CAI Z X.Current status and future development of mobile robot path planning technology[J].Control Engineering of China, 2005, 12(3):198-202.(in Chinese) [2] RAJA P.Optimal path planning of mobile robots:a review[J].International Journal of Physical Sciences, 2012, 7(9):1314-1320. [3] 王春颖, 刘平, 秦洪政.移动机器人的智能路径规划算法综述[J].传感器与微系统, 2018, 37(8):5-8. WANG C Y, LIU P, QIN H Z.Review on intelligent path planning algorithm of mobile robots[J].Transducer and Microsystem Technologies, 2018, 37(8):5-8.(in Chinese) [4] 张广林, 胡小梅, 柴剑飞, 等.路径规划算法及其应用综述[J].现代机械, 2011(5):85-90. ZHANG G L, HU X M, CHAI J F, et al.Summary of path planning algorithm and its application[J].Modern Machinery, 2011(5):85-90.(in Chinese) [5] TAVARES R S, MARTINS T C, TSUZUKI M S G.Simulated annealing with adaptive neighborhood:a case study in off-line robot path planning[J].Expert Systems with Applications, 2011, 38(4):2951-2965. [6] LIU Y C, ZHAO Y J.A virtual-waypoint based artificial potential field method for UAV path planning[C]//Proceedings of 2016 IEEE Chinese Guidance, Navigation and Control Conference.Washington D.C., USA:IEEE Press, 2016:949-953. [7] GARCIA M A P, MONTIEL O, CASTILLO O, et al.Optimal path planning for autonomous mobile robot navigation using ant colony optimization and a fuzzy cost function evaluation[J].Applied Soft Computing, 2009, 9(3):1102-1110. [8] 周滔, 赵津, 胡秋霞, 等.复杂环境下移动机器人全局路径规划与跟踪[J].计算机工程, 2018, 44(12):208-214. ZHOU T, ZHAO J, HU Q X, et al.Global path planning and tracking for mobile robot in cluttered environment[J].Computer Engineering, 2018, 44(12):208-214.(in Chinese) [9] LEE T K, BAEK S H, CHOI Y H, et al.Smooth coverage path planning and control of mobile robots based on high-resolution grid map representation[J].Robotics and Autonomous Systems, 2011, 59(10):801-812. [10] 刘传领.基于势场法和遗传算法的机器人路径规划技术研究[D].南京:南京理工大学, 2012. LIU C L.Researches on technologies for robot path planning based on artificial potential field and genetic algorithm[D].Nanjing:Nanjing University of Science and Technology, 2012.(in Chinese) [11] ZHU A, YANG S X.A neural network approach to dynamic task assignment of multirobots[J].IEEE Transactions on Neural Networks, 2006, 17(5):1278-1287. [12] RASHID R, PERUMAL N, ELAMVAZUTHI I, et al.Mobile robot path planning using ant colony optimization[C]//Proceedings of the 2nd IEEE International Symposium on Robotics and Manufacturing Automation.Washington D.C., USA:IEEE Press, 2016:1-6. [13] 胡章芳, 孙林, 张毅, 等.一种基于改进QPSO的机器人路径规划算法[J].计算机工程, 2019, 45(4):281-287. HU Z F, SUN L, ZHANG Y, et al.A robot path planning algorithm based on improved QPSO[J].Computer Engineering, 2019, 45(4):281-287.(in Chinese) [14] SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning, 1988, 3(1):9-44. [15] 赵冬斌, 邵坤, 朱圆恒, 等.深度强化学习综述:兼论计算机围棋的发展[J].控制理论与应用, 2016, 33(6):701-717. ZHAO D B, SHAO K, ZHU Y H, et al.Review of deep reinforcement learning and discussions on the development of computer go[J].Control Theory & Applications, 2016, 33(6):701-717.(in Chinese) [16] BELLMAN R.Dynamic programming and lagrange multipliers[J].Proceedings of the National Academy of Sciences, 1956, 42(10):767-769. [17] WERBOS P J.Advanced forecasting methods for global crisis warning and models of intelligence[J].General Systems Yearbook, 1977, 22(12):25-38. [18] WATKINS C J C H, DAYAN P.Q-learning[J].Machine Learning, 1992, 8(3/4):279-292. [19] RUMMERY G A, NIRANJAN M.On-line q-learning using connectionist systems[M].Cambridge, UK:University of Cambridge, 1994. [20] BERTSEKAS D P, TSITSIKLIS J N.Neuro-dynamic programming:an overview[C]//Proceedings of the 34th IEEE Conference on Decision and Control.Washington D.C., USA:IEEE Press, 1995:560-564. [21] KOCSIS L, SZEPESVARI C.Bandit based Monte-Carlo planning[C]//Proceedings of 2016 European Conference on Machine Learning.Berlin, Germany:Springer, 2006:282-293. [22] LEWIS F L, VRABIE D.Reinforcement learning and adaptive dynamic programming for feedback control[J].IEEE Circuits and Systems Magazine, 2009, 9(3):32-50. [23] SILVER D, LEVER G, HEESS N, et al.Deterministic policy gradient algorithms[C]//Proceedings of 2014 International Conference on Machine Learning.Washington D.C., USA:IEEE Press, 2014:387-395. [24] MNIH V, BADIA A P, MIRZA M, et al.Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning.Washington D.C., USA:IEEE Press, 2016:1928-1937. [25] ROUGIER J.Comment on "ensemble averaging and the curse of dimensionality"[J].Journal of Climate, 2018, 31(21):9015-9016. [26] SUTTON R S.Generalization in reinforcement learning:successful examples using sparse coarse coding[C]//Proceedings of 1996 International Conference Neural Information Processing Systems.Cambridge, USA:MIT Press, 1996:1038-1044. [27] NAIR D S, SUPRIYA P.Comparison of temporal difference learning algorithm and Dijkstra's algorithm for robotic path planning[C]//Proceedings of the 2nd International Conference on Intelligent Computing and Control Systems.Washington D.C., USA:IEEE Press, 2018:1619-1624. [28] MARTIN J, WANG J K, ENGLOT B.Sparse Gaussian process temporal difference learning for marine robot navigation[EB/OL].[2020-12-11].https://arxiv.org/abs/1810.01217. [29] LI S D, XU X, ZUO L.Dynamic path planning of a mobile robot with improved Q-learning algorithm[C]//Proceedings of 2015 IEEE International Conference on Information and Automation.Washington D.C., USA:IEEE Press, 2015:409-414. [30] 刘智斌, 曾晓勤, 刘惠义, 等.基于BP神经网络的双层启发式强化学习方法[J].计算机研究与发展, 2015, 52(3):579-587. LIU Z B, ZENG X Q, LIU H Y, et al.A heuristic two-layer reinforcement learning algorithm based on BP neural networks[J].Journal of Computer Research and Development, 2015, 52(3):579-587.(in Chinese) [31] JIANG L, HUANG H Y, DING Z H.Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge[J].IEEE/CAA Journal of Automatica Sinica, 2020, 7(4):1179-1189. [32] 周文吉, 俞扬.分层强化学习综述[J].智能系统学报, 2017, 12(5):590-594. ZHOU W J, YU Y.Summarize of hierarchical reinforcement learning[J].CAAI Transactions on Intelligent Systems, 2017, 12(5):590-594.(in Chinese) [33] BUITRAGO-MARTINEZ A, DE LA ROSA R F, LOZANO-MARTINEZ F.Hierarchical reinforcement learning approach for motion planning in mobile robotics[C]//Proceedings of 2013 Latin American Robotics Symposium and Competition.Washington D.C., USA:IEEE Press, 2013:83-88. [34] 刘志荣, 姜树海, 袁雯雯, 等.基于深度Q学习的移动机器人路径规划[J].测控技术, 2019, 38(7):24-28. LIU Z R, JIANG S H, YUAN W W, et al.Robot path planning based on deep Q-learning[J].Measurement & Control Technology, 2019, 38(7):24-28.(in Chinese) [35] 裴道武.关于模糊逻辑与模糊推理逻辑基础问题的十年研究综述[J].工程数学学报, 2004, 21(2):249-258. PEI D W.A survey of ten years' studies on fuzzy logic and fuzzy reasoning[J].Chinese Journal of Engineering Mathematics, 2004, 21(2):249-258.(in Chinese) [36] LUVIANO D, YU W.Continuous-time path planning for multi-agents with fuzzy reinforcement learning[J].Journal of Intelligent & Fuzzy Systems, 2017, 33(1):491-501. [37] BOWLING M, VELOSO M.Multiagent learning using a variable learning rate[J].Artificial Intelligence, 2002, 136(2):215-250. [38] WEN S H, CHEN J H, LI Z, et al.Fuzzy Q-learning obstacle avoidance algorithm of humanoid robot in unknown environment[C]//Proceedings of 2018 Chinese Control Conference.Washington D.C., USA:IEEE Press, 2018:5186-5190. [39] 葛媛, 布朋生, 刘强.模糊强化学习在机器人导航中的应用[J].信息技术, 2009, 33(10):127-130. GE Y, BU P S, LIU Q.Application of fuzzy Q-learning in robot navigation[J].Information Technology, 2009, 33(10):127-130.(in Chinese) [40] 朴松昊, 洪炳熔.一种动态环境下移动机器人的路径规划方法[J].机器人, 2003, 25(1):18-21, 43. PIAO S H, HONG B R.A path planning approach to mobile robot under dynamic environment[J].Robot, 2003, 25(1):18-21, 43.(in Chinese) [41] MEERZA S I A, ISLAM M, UZZAL M M.Q-learning based particle swarm optimization algorithm for optimal path planning of swarm of mobile robots[C]//Proceedings of 2019 International Conference on Advances in Science, Engineering and Robotics Technology.Washington D.C., USA:IEEE Press, 2019:1-5. [42] SHI Z G, TU J, ZHANG Q, et al.The improved Q-Learning algorithm based on pheromone mechanism for swarm robot system[C]//Proceedings of the 32nd Chinese Control Conference.Washington D.C., USA:IEEE Press, 2013:6033-6038. [43] YAO Q F, ZHENG Z Y, QI L, et al.Path planning method with improved artificial potential field-a reinforcement learning perspective[J].IEEE Access, 2020, 8:135513-135523. [44] LIU Z Y, LAN F, YANG H B.Partition heuristic RRT algorithm of path planning based on Q-learning[C]//Proceedings of 2019 Advanced Information Technology, Electronic and Automation Control Conference.Washington D.C., USA:IEEE Press, 2019:386-392. [45] 王子强, 武继刚.基于RDC-Q学习算法的移动机器人路径规划[J].计算机工程, 2014, 40(6):211-214. WANG Z Q, WU J G.Mobile robot path planning based on RDC-Q learning algorithm[J].Computer Engineering, 2014, 40(6):211-214.(in Chinese) [46] ZOU Q J, ZHANG Y, LIU S H.A path planning algorithm based on RRT and SARSA(λ) in unknown and complex conditions[C]//Proceedings of 2020 Chinese Control and Decision Conference.Washington D.C., USA:IEEE Press, 2020:2035-2040. [47] XU D, FANG Y C, ZHANG Z Y, et al.Path planning method combining depth learning and sarsa algorithm[C]//Proceedings of the 10th International Symposium on Computational Intelligence and Design.Washington D.C., USA:IEEE Press, 2017:77-82. [48] FATHINEZHAD F, DERHAMI V, REZAEIAN M.Supervised fuzzy reinforcement learning for robot navigation[J].Applied Soft Computing, 2016, 40:33-41. [49] DABOONI S, WUNSCH D.Heuristic dynamic programming for mobile robot path planning based on Dyna approach[C]//Proceedings of 2016 International Joint Conference on Neural Networks.Washington D.C., USA:IEEE Press, 2016:3723-3730. [50] VIET H H, AN S H, CHUNG T C.Dyna-Q-based vector direction for path planning problem of autonomous mobile robots in unknown environments[J].Advanced Robotics, 2013, 27(3):159-173. [51] HWANG K S, JIANG W C, CHEN Y J.Adaptive model learning method for reinforcement learning[C]//Proceedings of SICE'12.Washington D.C., USA:IEEE Press, 2012:1277-1280. [52] 刘建伟, 高峰, 罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报, 2019, 42(6):1406-1438. LIU J W, GAO F, LUO X L.Survey of deep reinforcement learning based on value function and policy gradient[J].Chinese Journal of Computers, 2019, 42(6):1406-1438.(in Chinese) [53] WANG Q Z, XU D, SHI L Y.A review on robot learning and controlling:imitation learning and human-computer interaction[C]//Proceedings of the 2013 Chinese Control and Decision Conference.Washington D.C., USA:IEEE Press, 2013:2834-2838. [54] LIU Y D, ZHANG W Z, CHEN F M, et al.Path planning based on improved deep deterministic policy gradient algorithm[C]//Proceedings of the 3rd Information Technology, Networking, Electronic and Automation Control Conference.Washington D.C., USA:IEEE Press, 2019:295-299. [55] PAUL S, VIG L.Deterministic policy gradient based robotic path planning with continuous action spaces[C]//Proceedings of 2017 IEEE International Conference on Computer Vision Workshops.Washington D.C., USA:IEEE Press, 2017:725-733. [56] ZHENG S F, LIU H.Improved multi-agent deep deterministic policy gradient for path planning-based crowd simulation[J].IEEE Access, 2019, 7:147755-147770. [57] PFEIFFER M, SHUKLA S, TURCHETTA M, et al.Reinforced imitation:sample efficient deep reinforcement learning for Mapless navigation by leveraging prior demonstrations[J].IEEE Robotics and Automation Letters, 2018, 3(4):4423-4430. [58] HUSSEIN A, ELYAN E, GABER M M, et al.Deep imitation learning for 3D navigation tasks[J].Neural Computing and Applications, 2018, 29(7):389-404. [59] XU J H, LIU Q W, GUO H, et al.Shared multi-task imitation learning for indoor self-navigation[C]//Proceedings of 2018 IEEE Global Communications Conference.Washington D.C., USA:IEEE Press, 2018:1-7. [60] GRONDMAN I, BUSONIU L, LOPES G A D, et al.A survey of Actor-Critic reinforcement learning:standard and natural policy gradients[J].IEEE Transactions on Systems, Man, and Cybernetics, 2012, 42(6):1291-1307. [61] MUSE D, WERMTER S.Actor-Critic learning for platform-independent robot navigation[J].Cognitive Computation, 2009, 1(3):203-220. [62] LACHEKHAB F, TADJINE M.Goal seeking of mobile robot using fuzzy actor critic learning algorithm[C]//Proceedings of the 7th International Conference on Modelling, Identification and Control.Washington D.C., USA:IEEE Press, 2015:1-6. [63] SHAO K, ZHAO D B, ZHU Y H, et al.Visual navigation with Actor-Critic deep reinforcement learning[C]//Proceedings of 2018 International Joint Conference on Neural Networks.Washington D.C., USA:IEEE Press, 2018:1-6. [64] 刘全, 翟建伟, 章宗长, 等.深度强化学习综述[J].计算机学报, 2018, 41(1):1-27. LIU Q, ZHAI J W, ZHANG Z Z, et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers, 2018, 41(1):1-27.(in Chinese) [65] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Playing Atari with deep reinforcement learning[EB/OL].[2020-12-11].https://arxiv.org/abs/1312.5602v1. [66] TAI L, PAOLO G, LIU M.Virtual-to-real deep reinforcement learning:continuous control of mobile robots for mapless navigation[C]//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2017:31-36. [67] 王珂, 卜祥津, 李瑞峰, 等.景深约束下的深度强化学习机器人路径规划[J].华中科技大学学报(自然科学版), 2018, 46(12):77-82. WANG K, BU X J, LI R F, et al.Path planning for robots based on deep reinforcement learning by depth constraint[J].Journal of Huazhong University of Science and Technology(Natural Science Edition), 2018, 46(12):77-82.(in Chinese) [68] 李辉, 祁宇明.一种复杂环境下基于深度强化学习的机器人路径规划方法[J].计算机应用研究, 2020, 37(S1):129-131. LI H, QI Y M.Robot path planning method based on deep reinforcement learning in complex environment[J].Application Research of Computers, 2020, 37(S1):129-131.(in Chinese) [69] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J].Nature, 2015, 518(7540):529-533. [70] GU S, LILLICRAP T, SUTSKEVER I, et al.Continuous deep Q learning with model-based acceleration[EB/OL].[2020-12-11].https://www.cnblogs.com/wangxiaocvpr/p/5664795.html. |