[1] ARULKUMARAN K, DEISENROTH M P.Deep reinforcement learning:a brief survey[J].IEEE Signal Processing Magazine, 2017, 34(6):26-38. [2] HÜTTENRAUCH M, ŠOŠIĆ A, NEUMANN G.Guided deep reinforcement learning for swarm systems[EB/OL].[2021-03-17].https://arxiv.org/abs/1709.06011. [3] CHU T S, WANG J, CODECÀ L, et al.Multi-agent deep reinforcement learning for large-scale traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3):1086-1095. [4] 徐西建, 王子磊, 奚宏生.基于深度强化学习的流媒体边缘云会话调度策略[J].计算机工程, 2019, 45(5):237-242, 248. XU X J, WANG Z L, XI H S.Session scheduling strategy for streaming media edge cloud based on deep reinforcement learning[J].Computer Engineering, 2019, 45(5):237-242, 248.(in Chinese) [5] SALLAB A E, ABDOU M, PEROT E, et al.Deep reinforcement learning framework for autonomous driving[J].Electronic Imaging, 2017, 29(19):70-76. [6] 韩向敏, 鲍泓, 梁军, 等.一种基于深度强化学习的自适应巡航控制算法[J].计算机工程, 2018, 44(7):32-35, 41. HAN X M, BAO H, LIANG J, et al.An adaptive cruise control algorithm based on deep reinforcement learning[J].Computer Engineering, 2018, 44(7):32-35, 41.(in Chinese) [7] VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al.Grandmaster level in StarCraft II using multi-agent reinforcement learning[J].Nature, 2019, 575(7782):350-354. [8] HERNANDEZ-LEAL P, KARTAL B, TAYLOR M E.A survey and critique of multiagent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems, 2019, 33(6):750-797. [9] TAMPUU A, MATIISEN T, KODELJA D, et al.Multiagent cooperation and competition with deep reinforcement learning[J].PLoS One, 2017, 12(4):17-23. [10] DE WITT C S, GUPTA T, MAKOVIICHUK D, et al.Is independent learning all you need in the starcraft multi-agent challenge?[EB/OL].[2021-03-17].http://arxiv.org/abs/2011.09533. [11] GUPTA J K, EGOROV M, KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[M].Berlin, Germany:Springer, 2017:66-83. [12] LOWE R, WU Y, TAMAR A, et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Cambridge, USA:MIT Press, 2017:6382-6393. [13] FOERSTER J, FARQUHAR G, AFOURAS T, et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2018:2974-2982. [14] IQBAL S, SHA F.Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning.New York, USA:ACM Press, 2019:2961-2970. [15] SUNEHAG P, LEVER G, GRUSLYS A, et al.Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems.Berlin, Germany:Springer, 2018:2085-2087. [16] RASHID T, SAMVELYAN M, WITT C S, et al.QMIX:monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Proceedings of the 35th International Conference on Machine Learning.New York, USA:ACM Press, 2018:4292-4301. [17] SON K, KIM D, KANG W J, et al.QTRAN:learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning.New York, USA:ACM Press, 2019:5887-5896. [18] OLIEHOEK F A, SPAAN M T J, VLASSIS N.Optimal and approximate Q-value functions for decentralized POMDPs[J].Journal of Artificial Intelligence Research, 2008, 32:289-353. [19] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J].Nature, 2015, 518(7540):529-533. [20] MNIH V, BADIA A P, MIRZA M, et al.Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33nd International Conference on Machine Learning.New York, USA:ACM Press, 2016:1928-1937. [21] LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[EB/OL].[2021-03-17].https://arxiv.org/abs/1509.02971. [22] SCHULMAN J, LEVINE S, MORITZ P, et al.Trust region policy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning.New York, USA:ACM Press, 2015:1889-1897. [23] HA D, DAI A M, LE Q V.Hypernetworks[C]//Proceedings of the 5th International Conference on Learning Representations.Amherst, USA:[s.n.], 2017:1-8. [24] OLIEHOEK F A, AMATO C.A concise introduction to decentralized POMDPs[M].Berlin, Germany:Springer, 2016. [25] HESSEL M, MODAYIL J, VAN HASSELT H, et al.Rainbow:combining improvements in deep reinforcement learning[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2018:3215-3222. [26] JAAKKOLA T, JORDAN M I, SINGH S P.On the convergence of stochastic iterative dynamic programming algorithms[J].Neural Computation, 1994, 6(6):1185-1201. [27] SUTTON R S, BARTO A G.Reinforcement learning:an introduction[J].IEEE Transactions on Neural Networks, 2005, 16(1):285-286. [28] KEARNS M J, SINGH S P.Bias-variance error bounds for temporal difference updates[C]//Proceedings of the 13th Annual Conference on Computational Learning Theory.San Francisco, USA:Morgan Kaufmann, 2000:142-147. [29] WANG Z, SCHAUL T, HESSEL M, et al.Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning.New York, USA:ACM Press, 2016:1995-2003. [30] SAMVELYAN M, RASHID T, DE WITT C S, et al.The starcraft multi-agent challenge[EB/OL].[2021-03-17].http://arxiv.org/abs/1902.04043. [31] MAHAJAN A, RASHID T, SAMVELYAN M, et al.MAVEN:multi-agent variational exploration[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.Cambridge, USA:MIT Press, 2019:7611-7622. |