[1] SU Ti,YANG Ming,WANG Chunxiang,et al.Classification and regression tree based traffic merging method for self-driving vehicles[J].Acta Automatica Sinica,2018,44(1):35-43.(in Chinese)苏锑,杨明,王春香,等.一种基于分类回归树的无人车汇流决策方法[J].自动化学报,2018,44(1):35-43. [2] WANG Ergen,SUN Jian.Merging influence factors recognition and behaviors prediction of on-ramp vehicles of urban expressway[J].Journal of Traffic and Transportation Engineering,2018,18(3):180-188.(in Chinese)王尔根,孙剑.城市快速路匝道车辆汇入影响因素识别与行为预测[J].交通运输工程学报,2018,18(3):180-188. [3] WANG Ergen,SUN Jian,JIANG Shun,et al.Modeling the various merging behaviors at expressway on-ramp bottlenecks using support vector machine models[J].Transportation Research Procedia,2017,25:1327-1341. [4] TESAUROG.TD-Gammon,a self-teaching backgammon program,achieves master-level play[J].Neural Computation,1994,6(2):215-219. [5] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of go without human knowledge[J].Nature,2017,550:354-359. [6] KOCSIS L,SZEPESV'ARI C.Bandit based Monte-Carlo planning[C]//Proceedings of European Conference on Machine Learning.Berlin,Germany:Springer,2006:282-293. [7] ZHAO T T,HACHIYA H,NIU G.Analysis and improvement of policy gradient estimation[J].Neural Networks,2012,26(2):118-129. [8] ZHANG Jianpei,LIU Yang,YANG Jing,et al.Research on clustering algorithms for search engine results[J].Computer Engineering,2004,30(5):95-97.(in Chinese)张健沛,刘洋,杨静,等.搜索引擎结果聚类算法研究[J].计算机工程,2004,30(5):95-97. [9] WATKINS C J C H,DAYAN P.Technical note:Q-learning[J].Machine Learning,1992,8(3/4):279-292. [10] SINGH S,JAAKKOLA T,LITTMAN M L,et al.Convergence results for single step on-policy reinforcement learning algorithms[J].Machine Learning,2000,38(3):287-308. [11] CHEN Xuesong,YANG Yimin.Command control and simulation[J].Application Research of Computers,2010,27(8):2834-2838,2844.(in Chinese)陈学松,杨宜民.强化学习研究综述[J].计算机应用研究,2010,27(8):2834-2838,2844. [12] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[EB/OL].[2018-12-01].https://arxiv.org/pdf/1312.5602v1.pdf. [13] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518:529-533. [14] KRÖSE B J A.Learning from delayed rewards[J].Robotics and Autonomous Systems,1995,15(4):233-235. [15] LIU Quan,ZHAI Jianwei,ZHANG Zongzhang,et al.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):1-27.(in Chinese)刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报,2018,41(1):1-27. [16] QIAO Liang,BAO Hong,XUAN Zuxing,et al.Autonomous driving ramp merging model based on reinforcement learning[J].Computer Engineering,2018,44(7):20-24,31.(in Chinese)乔良,鲍泓,玄祖兴,等.基于强化学习的无人驾驶匝道汇入模型[J].计算机工程,2018,44(7):20-24,31. [17] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[EB/OL].[2018-12-01].https://arxiv.org/pdf/1509.02971.pdf. [18] ROSENSTEIN M T,BARTO A G.Supervised learning combined with an actor-critic architecture:02-41[R].Amherst,USA:University of Massachusetts,2002. [19] LIN L J.Reinforcement learning for robots using neural networks[D].Pittsburgh,USA:Carnegie-Mellon University,1993. [20] SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms[C]//Proceedings of the 31st International Conference on Machine Learning.Beijing,China:[s.n.],2014:387-395. |