[1] SUTTON R S, BARTO A G.Reinforcement learning:an introduction[M].Cambridge, USA:MIT Press, 2018. [2] KEARNS M, SINGH S.Near-optimal reinforcement learning in polynomial time[J].Machine Learning, 2002, 49(2/3):209-232. [3] JAKSCH T, ORTNER R, AUER P.Near-optimal regret bounds for reinforcement learning[J].Journal of Machine Learning Research, 2010, 11(4):1-5. [4] MONTAGUE P R.Reinforcement learning:an introduction, by Sutton, R.S. and Barto, A.G.[J].Trends in Cognitive Sciences, 1999, 3(9):360. [5] WILLIAMS R J.Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning, 1992, 8(3/4):229-256. [6] GEIST M, PIETQUIN O.Managing uncertainty within value function approximation in reinforcement learning[C]//Proceedings of Active Learning and Experimental Design Workshop.Sardinia, Italy:[s.n.], 2010:92. [7] BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al.Unifying count-based exploration and intrinsic motivation[J].Advances in Neural Information Processing Systems, 2016, 29:1471-1479. [8] OSTROVSKI G, BELLEMARE M G, OORD A, et al.Count-based exploration with neural density models[C]//Proceedings of the 34th International Conference on Machine Learning.Sydney, Australia:JMLR, 2017:2721-2730. [9] PATHAK D, AGRAWAL P, EFROS A A, et al.Curiosity-driven exploration by self-supervised prediction[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:16-17. [10] PLAPPERT M, HOUTHOOFT R, DHARIWAL P, et al.Parameter space noise for exploration[EB/OL].(2017-06-06)[2020-09-10].https://arxiv.org/pdf/1706.01905.pdf. [11] FORTUNATO M, AZAR M G, PIOT B, et al.Noisy networks for exploration[EB/OL].(2017-06-30)[2020-09-10].https://arxiv.org/pdf/1706.10295v1.pdf. [12] ZHANG X, MA Y, SINGLA A.Task-agnostic exploration in reinforcement learning[EB/OL].(2017-06-16)[2020-09-10].https://arxiv.org/pdf/2006.09497v1.pdf. [13] COMANICI G, PRECUP D.Optimal policy switching algorithms for reinforcement learning[C]//Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems.Washington D.C., USA:IEEE Press, 2010:709-714. [14] OTTERLO M, WIERING M.Reinforcement learning and Markov decision processes[M]//WIERING M, OTTERLO M.Reinforcement learning.Berlin, Germany:Springer, 2012:3-42. [15] FRANÇOIS-LAVET V, HENDERSON P, ISLAM R, et al.An introduction to deep reinforcement learning[J].Foundations and Trends in Machine Learning, 2018, 11(3/4):1-145. [16] SHAO K, TANG Z, ZHU Y, et al.A survey of deep reinforcement learning in video games[EB/OL].(2019-12-23)[2020-09-10].https://arxiv.org/pdf/1912.10944.pdf. [17] HAARNOJA T, PONG V, ZHOU A, et al.Composable deep reinforcement learning for robotic manipulation[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2018:6244-6251. [18] WOLF T, DEBUT L, SANH V, et al.HuggingFace's transformers:state-of-the-art natural language processing[C]//Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing:System Demonstrations.[S.l.]:Association for Computational Linguistics, 2020:38-45. [19] ESTEVA A, ROBICQUET A, RAMSUNDAR B, et al.A guide to deep learning in healthcare[J].Nature Medicine, 2019, 25(1):24-29. [20] YANG D, ZHAO L, LIN Z, et al.Fully parameterized quantile function for distributional reinforcement learning[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2019:6193-6202. [21] DABNEY W, OSTROVSKI G, SILVER D, et al.Implicit quantile networks for distributional reinforcement learning[C]//Proceedings of the 35th International Conference on Machine Learning.[S.l.]:PMLR, 2018:1096-1105. [22] DABNEY W, ROWLAND M, BELLEMARE M G, et al.Distributional reinforcement learning with quantile regression[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence.[S.l.]:AAAI, 2018:1-5. [23] MNIH V, BADIA A P, MIRZA M, et al.Asynchronous methods for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning.Washington D.C., USA:IEEE Press, 2016:1928-1937. [24] ZHANG H, CHEN H, XIAO C, et al.Robust deep reinforcement learning against adversarial perturbations on observations[C]//Proceedings of NeurIPS 2020.Washington D.C., USA:IEEE Press, 2020:1-14. [25] TOROMANOFF M, WIRBEL E, MOUTARDE F.Is deep reinforcement learning really superhuman on Atari?[C]//Proceedings of the 39th Conference on Neural Information Processing Systems.Vancouver, Canada:[s.n.], 2019:1-5. |