[1] YANG J C, NAKHAEI A, ISELE D, et al.CM3:cooperative multi-goal multi-stage multi-agent reinforcement learning[EB/OL].[2021-10-20].https://arxiv.org/abs/1809.05188. [2] TAN M.Multi-agent reinforcement learning:independent vs.cooperative agents[C]//Proceedings of the 20th ACM International Conference on Machine Learning.Washington D.C., USA:ACM Press, 1993:330-337. [3] LOWE R, WU Y, TAMAR A, et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge, USA:MIT Press, 2017:6382-6393. [4] LI S H, WU Y, CUI X Y, et al.Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient[J].Artificial Intelligence, 2019, 33(1):4213-4220. [5] RASHID T, SAMVELYAN M, WITT C D.QMIX:monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Proceedings of International Conference on Learning Representations.Washington D.C., USA:IEEE Press, 2018:1-14. [6] LITTMAN M L.Markov games as a framework for multi-agent reinforcement learning[M].Amsterdam, Netherlands:Elsevier, 1994. [7] TAMPUU A, MATIISEN T, KODELJA D, et al.Multi-agent cooperation and competition with deep reinforcement learning[J].PLoS One, 2017, 12(4):0172395. [8] MORDATCH I, ABBEEL P.Emergence of grounded compositional language in multi-agent populations[J].Artificial Intelligence, 2018, 32(1):1-16. [9] PANAIT L, LUKE S A.Cooperative multi-agent learning:the state of the art[J].Autonomous Agents and Multi-Agent Systems, 2005, 11(3):387-434. [10] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J].Nature, 2015, 518(7540):529-533. [11] OMIDSHAFIEI S, PAZIS J, AMATO C, et al.Deep decentralized multi-task multi-agent reinforcement learning under partial observability[EB/OL].[2021-10-20].https://arxiv.org/abs/1703.06182. [12] FOERSTER J, NARDELLI N, FARQUHAR G, et al.Stabilising experience replay for deep multi-agent reinforcement learning[C]//Proceedings of the 34th International Conference on Machine Learning.Washington D.C., USA:IEEE Press, 2017:1146-1155. [13] GUESTRIN C, KOLLER D, PARR R.Multi-agent planning with factored MDPs[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge, USA:MIT Press, 2002:1523-1530. [14] FANG M, GROEN F C A.Collaborative multi-agent reinforcement learning based on experience propagation[J].Journal of Systems Engineering and Electronics, 2013, 24(4):683-689. [15] SINGH A, JAIN T, SUKHBAATAR S.Learning when to communicate at scale in multiagent cooperative and competitive tasks[EB/OL].[2021-10-20].https://arxiv.org/abs/1812.09755. [16] AUSTERWEIL J L, BRAWNER S, GREENWALD A, et al.How other-regarding preferences can promote cooperation in nonzero-sum grid games[C]//Proceedings of AAAI Symposium on Challenges and Opportunities in Multi-agent Learning for Real World.[S.1.]:AAAI Press, 2016:1-18. [17] FOERSTER J, FARQUHAR G, AFOURAS T, et al.Counterfactual multi-agent policy gradients[J].Artificial Intelligence, 2018, 32(1):1-14. [18] GUPTA J K, EGOROV M, KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//Proceedings of International Conference on Autonomous Agents and Multiagent Systems.Washington D.C., USA:IEEE Press, 2017:66-83. [19] SUNEHAG P, LEVER G, GRUSLYS A, et al.Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and Multi-agent Systems.Berlin, Germany:Springer, 2017:1-18. [20] ZHANG K Q, YANG Z R, LIU H, et al.Fully decentralized multi-agent reinforcement learning with networked agents[EB/OL].[2021-10-20].https://arxiv.org/abs/1802.08757. [21] OLIEHOEK F A, SPAAN M T J, VLASSIS N.Optimal and approximate Q-value functions for decentralized POMDPs[J].Journal of Artificial Intelligence Research, 2008, 32(3):289-353. [22] OLIEHOEK F A, AMATO C.A concise introduction to decentralized POMDPs[M].Berlin, Germany:Springer, 2016. [23] HAUSKNECHT M, STONE P.Deep recurrent Q-learning for partially observable MDPs[C]//Proceedings of AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents.[S.1.]:AAAI Press, 2015:1-13. [24] HOCHREITER S, SCHMIDHUBER J.Long short-term memory[J].Neural Computation, 1997, 9(8):1735-1780. [25] CHUNG J, GULCEHRE C, CHO K, et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge, USA:MIT Press, 2014:1-15. [26] CHUNG J, GULCEHRE C, CHO K, et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].[2021-10-20].https://arxiv.org/abs/1412.3555. [27] DUGAS C, BENGIO Y, BLISLE F, et al.Incorporating functional knowledge in neural networks[J].Journal of Machine Learning Research, 2009, 10(3):1239-1262. [28] HA D, DAI A, LE Q V.Hyper networks[C]//Proceedings of International Conference on Learning Representations.Washington D.C., USA:IEEE Press, 2017:1-17. [29] VINYALS O, EWALDS T, BARTUNOV S, et al.StarCraft II:a new challenge for reinforcement learning[EB/OL].[2021-10-20].https://arxiv.org/abs/1708.04782. |