[1] SUTTON R S,BARTO A G.Reinforcement learning:an introduction[M].Cambridge,USA:MIT Press,2018. [2] MAO Jiangyun,WU Hao,SUN Weiwei.Vehicle trajectory anomaly detection in road network via Markov decision process[J].Chinese Journal of Computers,2018,41(8):1928-1942.(in Chinese)毛江云,吴昊,孙未未.路网空间下基于马尔可夫决策过程的异常车辆轨迹检测算法[J].计算机学报,2018,41(8):1928-1942. [3] YAN Qicui.Research on the method to solve the dimension disaster problem in reinforcement learning[D].Suzhou:Soochow University,2010.(in Chinese)闫其粹.解决强化学习中维数灾问题的方法研究[D].苏州:苏州大学,2010. [4] HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507. [5] CHENG Jiayuan.Application framework of deep learning in radar communication object recognition[J].Modern Radar,2018,40(8):55-59.(in Chinese)程嘉远.深度学习在雷达通信目标识别中的应用框架[J].现代雷达,2018,40(8):55-59. [6] DENG Li,YU Dong.Deep learning:methods and applications[M].Hanover,USA:Now Publishers Inc.,2014. [7] MOUSAVI S S,SCHUKAT M,HOWLEY E.Deep reinforcement learning:an overview[C]//Proceedings of SAI Intelligent Systems Conference.Berlin,Germany:Springer,2017:426-440. [8] LIU Quan,ZHAI Jianwei,ZHANG Zongchang,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27.(in Chinese)刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,41(1):1-27. [9] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [10] WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[EB/OL].[2019-04-10].https://arxiv.org/pdf/1511.06581.pdf. [11] LEVINE S,FINN C,DARRELL T,et al.End-to-end training of deep visuomotor policies[J].Journal of Machine Learning Research,2015,17(1):1-40. [12] LEVINE S,PASTOR P,KRIZHEVSKY A,et al.Learning hand-eye coordination for robotic grasping with large-scale data collection[C]//Proceedings of International Symposium on Experimental Robotics.Berlin,Germany:Springer,2016:173-184. [13] SHIBATA K,IIDA M.Acquisition of box pushing by direct-vision-based reinforcement learning[C]//Proceedings of the Society of Instrument and Control Engineers Annual Conference.Washington D.C.,USA:IEEE Press,2003:2322-2327. [14] LANGE S,RIEDMILLER M.Deep auto-encoder neural networks in reinforcement learning[C]//Proceedings of International Joint Conference on Neural Networks.Washington D.C.,USA:IEEE Press,2010:1-8. [15] KOUTNIK J,SCHMIDHUBER J,GOMEZ F.Online evolution of deep convolutional network for vision-based reinforcement learning[C]//Proceedings of International Conference on Simulation of Adaptive Behavior.Berlin,Germany:Springer,2014:260-269. [16] ABTAHI F,ZHU Z,BURRY A M.A deep reinforcement learning approach to character segmentation of license plate images[C]//Proceedings of International Conference on Machine Vision Applications.Washington D.C.,USA:IEEE Press,2015:539-542. [17] LIAO Xiaomin,YAN Shaohu,SHI Jia,et al.Deep reinforcement learning based resource allocation algorithm in cellular networks[J].Journal on Communications,2019,40(2):11-18.(in Chinese)廖晓闽,严少虎,石嘉,等.基于深度强化学习的蜂窝网资源分配算法[J].通信学报,2019,40(2):11-18. [18] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[EB/OL].[2019-04-10].https://arxiv.org/pdf/1312.5602.pdf. [19] WATKINS C J C H.Learning from delayed rewards[J].Robotics and Autonomous Systems,1989,15(4):233-235. [20] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [21] BAI Chenjia,LIU Peng,ZHAO Wei,et al.Active sampling for deep Q-learning based on TD-error adaptive correction[J].Journal of Computer Research and Development,2019,56(2):262-280.(in Chinese)白辰甲,刘鹏,赵巍,等.基于TD-error自适应校正的深度Q学习主动采样方法[J].计算机研究与发展,2019,56(2):262-280. [22] LIU Quan,ZHAI Jianwei,ZHONG Shan,et al.A deep recurrent Q-network based on visual attention mechanism[J].Chinese Journal of Computers,2017,40(6):127-140.(in Chinese)刘全,翟建伟,钟珊,等.一种基于视觉注意力机制的深度循环Q网络模型[J].计算机学报,2017,40(6):127-140. |