[1] RIZK Y,AWAD M,TUNSTEL E W.Decision making in multi-agent systems:a survey[J].IEEE Transactions on Cognitive and Developmental Systems,2018,10(3):514-529. [2] SUTTON R S,BARTO A G.An introduction[M].Cambridge,USA:MIT Press,1998. [3] HERNANDEZ-LEAL P,KARTAL B,TAYLOR M E.A survey and critique of multi-agent deep reinforcement learning[J].Autonomous Agents and Multi-Agent Systems,2019,33(6):750-797. [4] CHU T S,WANG J,CODECÀ L,et al.Multi-agent deep reinforcement learning for large-scale traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2020,21(3):1086-1095. [5] EL SALLAB A,ABDOU M,PEROT E,et al.Deep reinforcement learning framework for autonomous driving[J].Electronic Imaging,2017,29(19):70-76. [6] 叶佩文,贾向东,杨小蓉,等.面向车联网的多智能体强化学习边云协同卸载[J].计算机工程,2021,47(4):13-20. YE P W,JIA X D,YANG X R,et al.Collaborative edge and cloud offloading for Internet of vehicles using multi-agent reinforcement learning[J].Computer Engineering,2021,47(4):13-20.(in Chinese) [7] REN H,BEN-TZVI P.Advising reinforcement learning toward scaling agents in continuous control environments with sparse rewards[J].Engineering Applications of Artificial Intelligence,2020,90:103515. [8] STADIE B C,LEVINE S,ABBEEL P.Incentivizing exploration in reinforcement learning with deep predictive models[EB/OL].[2022-03-01].https://arxiv.org/abs/1507.00814. [9] PATHAK D,AGRAWAL P,EFROS A A,et al.Curiosity-driven exploration by self-supervised prediction[C]//Proceedings of the 34th International Conference on Machine Learning.New York,USA:ACM Press,2017:2778-2787. [10] BURDA Y,EDWARDS H,STORKEY A,et al.Exploration by random network distillation[EB/OL].[2022-03-01].https://arxiv.org/abs/1810.12894. [11] SCHAFER L.Curiosity in multi-agent reinforcement learning[D].Edinburgh,UK:The University of Edinburgh,2019. [12] ANSCHEL O,BARAM N,SHIMKIN N.Averaged-DQN:variance reduction and stabilization for deep reinforcement learning[EB/OL].[2022-03-01].https://arxiv.org/abs/1611.01929. [13] FUJIMOTO S,HOOF H V,MEGER D.Addressing function approximation error in actor-critic methods[EB/OL].[2022-03-01].https://arxiv.org/pdf/1802.09477.pdf. [14] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[C]//Proceedings of the 4th International Conference on Learning Representations.Washington D.C.,USA:IEEE Press,2016:432-446. [15] LAN Q,PAN Y,FYSHE A,et al.Maxmin Q-learning:controlling the estimation Bias of Q-learning[EB/OL].[2022-03-01].https://dblp.uni-trier.de/rec/conf/iclr/LanPFW20.html. [16] FENGJIA O,ZHANG F.A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment[J].Neurocomputing,2020,411:206-215. [17] LOWE R,WU Y,TAMAR A,et al.Multi-agent actor critic for mixed cooperative-competitive environments[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.Cambridge,USA:MIT Press,2017:6379-6390. [18] OLIEHOEK F A,SPAAN M T J,VLASSIS N.Optimal and approximate Q-value functions for decentralized POMDPs[J].Journal of Artificial Intelligence Research,2008,32:289-353. [19] NEHMZOW U,GATSOULIS Y,KERR E,et al.Novelty detection as an intrinsic motivation for cumulative learning robots[M].Berlin,Germany:Springer,2012. [20] OUDEYER P Y,KAPLAN F.What is intrinsic motivation?A typology of computational approaches[J].Frontiers in Neurorobotics,2007,1:6. [21] BARTO A G.Intrinsic motivation and reinforcement learning[M].Berlin,Germany:Springer,2012. [22] SIDDIQUE N,DHAKAN P,RANO I,et al.A review of the relationship between novelty,intrinsic motivation and reinforcement learning[J].Journal of Behavioral Robotics,2017,8(1):58-69. [23] ZHENG L,CHEN J,WANG J,et al.Episodic multi-agent reinforcement learning with curiosity-driven exploration[EB/OL].[2022-03-01].https://arxiv.org/abs/2111.11032. [24] ZAHEER M,KOTTER S,RAVANBAKHSH S,et al.Deep sets[C]//Proceedings of Advaces in Neural Information Processing System.Cambridge,USA:MIT Press,2017:3191-3401. [25] MAHAJAN A,RASHID T,SAMVELYAN M,et al.MAVEN:multi-agent variational exploration[EB/OL].[2022-03-01].https://arxiv.org/abs/1910.07483. [26] KINGMA D P,BA J.Adam:a method for stochastic optimization[EB/OL].[2022-03-01].https://arxiv.org/abs/1412.6980. |