[1] 于振中, 李强, 樊启高.智能仿生算法在移动机器人路径规划优化中的应用综述[J].计算机应用研究, 2019, 36(11):3210-3219. YU Z Z, LI Q, FAN Q G.Survey on application of bioinspired intelligent algorithms in path planning optimization of mobile robots[J].Application Research of Computers, 2019, 36(11):3210-3219.(in Chinese) [2] 潘昕, 吴旭升, 侯新国, 等.基于遗传蚂蚁混合算法的AUV全局路径规划[J].华中科技大学学报(自然科学版), 2017, 45(5):45-49, 76. PAN X, WU X S, HOU X G, et al.Global path planning based on genetic-ant hybrid algorithm for AUV[J].Journal of Huazhong University of Science and Technology(Natural Science Edition), 2017, 45(5):45-49, 76.(in Chinese) [3] 宋晓琳, 周南, 黄正瑜, 等.改进RRT在汽车避障局部路径规划中的应用[J].湖南大学学报(自然科学版), 2017, 44(4):30-37. SONG X L, ZHOU N, HUANG Z Y, et al.An improved RRT algorithm of local path planning for vehicle collision avoidance[J].Journal of Hunan University(Natural Sciences), 2017, 44(4):30-37.(in Chinese) [4] CHEN Y B, LUO G C, MEI Y S, et al.UAV path planning using artificial potential field method updated by optimal control theory[J].International Journal of Systems Science, 2016, 47(6):1407-1420. [5] YEN C T, CHENG M F.A study of fuzzy control with ant colony algorithm used in mobile robot for shortest path planning and obstacle avoidance[J].Microsystem Technologies, 2018, 24(1):125-135. [6] PANOV A I, YAKOVLEV K S, SUVOROV R.Grid path planning with deep reinforcement learning:preliminary results[J].Procedia Computer Science, 2018, 123:347-353. [7] MACEK K, PETROVIC I, PERIC N.A reinforcement learning approach to obstacle avoidance of mobile robots[C]//Proceedings of the 7th International Workshop on Advanced Motion Control.Washington D.C., USA:IEEE Press, 2002:462-466. [8] ZHANG Q, LI M, WANG X S, et al.Reinforcement learning in robot path optimization[J].Journal of Software, 2012, 7(3):657-662. [9] 刘智斌, 曾晓勤, 刘惠义, 等.基于BP神经网络的双层启发式强化学习方法[J].计算机研究与发展, 2015, 52(3):579-587. LIU Z B, ZENG X Q, LIU H Y, et al.A heuristic two-layer reinforcement learning algorithm based on BP neural networks[J].Journal of Computer Research and Development, 2015, 52(3):579-587.(in Chinese) [10] 刘全, 翟建伟, 章宗长, 等.深度强化学习综述[J].计算机学报, 2018, 41(1):1-27. LIU Q, ZHAI J W, ZHANG Z C, et al.Overview of deep reinforcement learning[J].Chinese Journal of Computers, 2018, 41(1):1-27.(in Chinese) [11] 刘建伟, 高峰, 罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报, 2019, 42(6):1406-1438. LIU J, GAO F, LUO X.Survey of deep reinforcement learning based on value function and policy gradient[J].Chinese Journal of Computers, 2019, 42(6):1406-1438.(in Chinese) [12] WATKINS C J C H, DAYAN P.Q-learning[J].Machine Learning, 1992, 8(3/4):279-292. [13] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Playing atari with deep reinforcement learning[EB/OL].[2021-08-20].https://arxiv.org/abs/1312.5602. [14] LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[EB/OL].[2021-08-20].https://arxiv.org/abs/1509.02971. [15] WILLIAMS R J.Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning, 1992, 8(3/4):229-256. [16] SCHULMAN J, LEVINE S, MORITZ P, et al.Trust region policy optimization[EB/OL].[2021-08-20].https://arxiv.org/abs/1502.05477. [17] SCHULMAN J, WOLSKI F, DHARIWAL P, et al.Proximal policy optimization algorithms[EB/OL].[2021-08-20].https://arxiv.org/abs/1707.06347. [18] HÄMÄLÄINEN P, BABADI A, MA X X, et al.PPO-CMA:proximal policy optimization with covariance matrix adaptation[C]//Proceedings of the 30th International Workshop on Machine Learning for Signal Processing.Washington D.C., USA:IEEE Press, 2020:1-6. [19] 申怡, 刘全.基于自指导动作选择的近端策略优化算法[J].计算机科学, 2021, 48(12):297-303. SHEN Y, LIU Q.Proximal policy optimization based on self-directed action selection[J].Computer Science, 2021, 48(12):297-303.(in Chinese) [20] GREFF K, SRIVASTAVA R K, KOUTNÍK J, et al.LSTM:a search space odyssey[J].IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(10):2222-2232. [21] ZHAO L, ROH M I, LEE S J.Control method for path following and collision avoidance of autonomous ship based on deep reinforcement learning[J].Journal of Marine Science and Technology, 2019, 27(4):293-310. [22] 王牛, 李祖枢, 李永龙, 等.带驱动直流电机两轮机器人运动系统仿真[J].系统仿真学报, 2008, 20(17):4633-4638, 4646. WANG N, LI Z S, LI Y L, et al.Motion system simulation of two wheeled robot with DC motor drive system[J].Journal of System Simulation, 2008, 20(17):4633-4638, 4646.(in Chinese) [23] 高艺, 马国庆, 于正林, 等.一种六自由度工业机器人运动学分析及三维可视化仿真[J].中国机械工程, 2016, 27(13):1726-1731. GAO Y, MA G Q, YU Z L, et al.Kinematics analysis of an 6-DOF industrial robot and its 3D visualization simulation[J].China Mechanical Engineering, 2016, 27(13):1726-1731.(in Chinese) [24] 杨惟轶, 白辰甲, 蔡超, 等.深度强化学习中稀疏奖励问题研究综述[J].计算机科学, 2020, 47(3):182-191. YANG W Y, BAI C J, CAI C, et al.Survey on sparse reward in deep reinforcement learning[J].Computer Science, 2020, 47(3):182-191.(in Chinese) [25] GAO J, YE W, GUO J, et al.Deep reinforcement learning for indoor mobile robot path planning[J].Sensors(Basel, Switzerland), 2020, 20(19):E5493. [26] SHERSTINSKY A.Fundamentals of Recurrent Neural Network(RNN) and Long Short-Term Memory(LSTM) network[J].Physica D:Nonlinear Phenomena, 2020, 404:132306. |