[1] ZHAN Weiwei,WANG Wei,CHEN Nengcheng,et al.A UAV trajectory planning using improved A* algorithm[J].Geomatics and Information Science of Wuhan University,2015,40(3):315-320.(in Chinese)占伟伟,王伟,陈能成,等.一种利用改进A*算法的无人机航迹规划[J].武汉大学学报(信息科学版),2015,40(3):315-320. [2] LI Nan,LIU Peng,DENG Renbo,et al.Three dimensional path planning for unmanned aerial vehicles based on improved genetic algorithm[J].Computer Simulation,2017,34(12):22-25,35.(in Chinese)李楠,刘朋,邓人博,等.基于改进遗传算法的无人机三维航路规划[J].计算机仿真,2017,34(12):22-25,35. [3] GE Yan,SHUI Wei,HAN Yu,et al.Route optimization based on Bayesian network and ant colony algorithm[J].Computer Engineering,2009,35(12):175-177.(in Chinese)葛艳,税薇,韩玉,等.基于贝叶斯网络和蚁群算法的航路优化[J].计算机工程,2009,35(12):175-177. [4] FANG Qun,XU Qing.3D route planning for UAV based on improved PSO algorithm[J].Journal of Northwestern Polytechnical University,2017,35(1):66-73.(in Chinese)方群,徐青.基于改进粒子群算法的无人机三维航迹规划[J].西北工业大学学报,2017,35(1):66-73. [5] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [6] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[EB/OL].[2020-01-23].https://arxiv.org/pdf/1312.5602v1.pdf. [7] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning.Washington D.C.,USA:IEEE Press,2016:1928-1937. [8] JARADAT M A K,AL-ROUSAN M,QUADAN L.Reinforcement based mobile robot navigation in dynamic environment[J].Robotics and Computer Integrated Manufacturing,2011,27(1):135-149. [9] ZHU Y,MOTTAGHI R,KOLVE E,et al.Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C.,USA:IEEE Press,2017:3357-3364. [10] TAI L,LIU M.Towards cognitive exploration through deep reinforcement learning for mobile robots[EB/OL].[2020-01-23].https://arxiv.org/pdf/1610.01733.pdf. [11] TAI L,PAOLO G,LIU M.Virtual-to-real deep reinforce-ment learning:continuous control of mobile robots for mapless navigation[EB/OL].[2020-01-23].https://arxiv.org/pdf/1703.00420.pdf. [12] DING Mingyue,ZHENG Changwen,ZHOU Chengping,et al.UAV flight path planning[M].Beijing:Publishing House of Electronics Industry,2009.(in Chinese)丁明跃,郑昌文,周程平,等.无人飞行器航迹规划[M].北京:电子工业出版社,2009. [13] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [14] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359. [15] IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[EB/OL].[2020-01-23].https://arxiv.org/pdf/1502.03167.pdf. [16] HAHNLOSER R H R,SARPESHKAR R,MAHOWALD M A,et al.Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit[J].Nature,2000,405(6789):947-951. [17] HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4700-4708. [18] KAUFMAN H,HOWARD R A.Dynamic programming and Markov processes[J].The American Mathematical Monthly,1961,68(2):194-201. [19] SUTTON R S,BARTO A G.Reinforcement learning:an introduction[J].IEEE Transactions on Neural Networks,1998,9(5):1054-1068. [20] KINGMA D P,BA J.Adam:a method for stochastic optimization[EB/OL].[2020-01-23].https://arxiv.org/pdf/1412.6980.pdf. [21] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[EB/OL].[2020-01-23].https://arxiv.org/pdf/1707.06347.pdf. [22] SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region policy optimization[EB/OL].[2020-01-23].https://arxiv.org/pdf/1502.05477.pdf. |