[1] URIELI D,STONE P.TacTex'13:a champion adaptive power trading agent[C]//Proceedings of the 28th Conference on Artificial Intelligence.Washington D.C.,USA:AAAI Press,2014:465-471. [2] BAZZAN A.Opportunities for multi-agent systems and multi-agent reinforcement learning in traffic control[J].Autonomous Agents and Multi-Agent Systems,2009,18(3):342-375. [3] DURKOTA K,LISY V,BOSANSKY B,et al.Approximate solutions for attack graph games with imperfect information[C]//Proceedings of 2015 International Conference on Decision and Game Theory for Security.Berlin,Germany:Springer,2015:228-249. [4] SHANG Tongfei,WU Jinyun,MA Jianfeng.Research on performance evaluation of wargame system based on deep reinforcement learning[J].Journal of Physics,2019,1302(3):25-37. [5] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359. [6] SILVER D,HUANG A,MADDISON C,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [7] SILVER D,HUBERT T,SCHRITTWIESER J,et al.Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].[2019-12-15].https://www.researchgate.net/publication/321571298_Mastering_Chess_and_Shogi_by_Self-Play_with_a_General_Reinforcement_Learning_Algorithm. [8] BOWLING M,BURCH N,JOHANSON M,et al.Heads-up limit hold'em poker is solved[J].Science,2015,347(6218):145-149. [9] MORACIK M,SCHMID M,BURCH N,et al.Deepstack:expert level artificial intelligence in heads-up no-limit poker[J].Science,2017,356(6337):508-513. [10] VINYALS O,EWALDS T,BARTUNOV S,et al.Starcraft Ⅱ:a new challenge for reinforcement learning[EB/OL].[2019-12-15].https://www.researchgate.net/publication/319151530_StarCraft_Ⅱ_A_New_Challenge_for_Reinfo-rcement_Learning. [11] BANSAL T,PACHOCKI J,SIDOR S,et al.Emergent com-plexity via multi-agent competition[EB/OL].[2019-12-15].https://www.researchgate.net/publication/320322134_Emergent_Complexity_via_Multi-Agent_Competition. [12] DRACHEN A,YANCEY M,MAGUIRE J,et al.Skill-based differences in spatio-temporal team behaviour in defence of the ancients 2[C]//Proceedings of 2014 IEEE Games,Entertainment,and Media Conference.Washington D.C.,USA:IEEE Press,2014:1-8. [13] MSRA.Which game is more difficult for AI?Use math to analyze[EB/OL].[2019-12-15].https://zhuanlan.zhihu.com/p/78321765.(in Chinese)微软亚洲研究院.哪类游戏AI难度更高?用数学来分析一下[EB/OL].[2019-12-15].https://zhuanlan.zhihu.com/p/78321765. [14] MELKO E,NAGY B.Optimal strategy in games with chance nodes[J].Acta Cybernetica,2007,18(2):171-192. [15] Ballard B W.Minimax search procedure for trees containing chance nodes[J].Artificial Intelligence,1983,21(3):327-350. [16] LIN Dianyu.The study of mahjong artificial intelligence[D].Hsinchu,China:National Chiao Tung University,2008.(in Chinese)林典余.麻将之人工智慧研究[D].中国台湾新竹:台湾国立交通大学,2008. [17] ZHUANG Likai.Research on mahjong artificial intelligence[D].Hsinchu,China:National Chiao Tung University,2015.(in Chinese)荘立楷.麻将人工智慧之研究[D].中国台湾新竹:台湾国立交通大学,2015. [18] SELTEN R.Bounded rationality[J].Journal of Institutional and Theoretical Economics,1990,146(4):649-658. [19] ARIELY D.Predictably irrational[M].New York,USA:Harper Collins,2008. [20] HASSELT H V,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning[EB/OL].[2019-12-15].https://www.researchgate.net/publication/282182152_Deep_Reinforcement_Learning_with_Double_Q-learning. [21] CAMPBELL M,MARSLAND T.A comparison of minimax tree search algorithms[J].Artificial Intelligence,1983,20(4):347-367. [22] BELLMAN R.Dynamic programming[J].Science,1966,153(3731):34-37. [23] SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine learning,1988,4(3):9-44. [24] METROPOLIS N,ULAM S.The monte carlo method[J].Journal of the American Statistical Association,1949,44(247):335-341. [25] WATKINS C,DAYAN P.Q-learning[J].Machine Learning,1992,8(3):279-292. [26] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[EB/OL].[2019-12-15].https://www.oalib.com/paper/4042798#.X_vxnFN_kZQ. [27] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [28] HE Keming,ZHANG Xiaoyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778. [29] CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297. [30] KINGMA D,BA J.Adam:a method for stochastic optimization[EB/OL].[2019-12-15].https://www.oalib.com/paper/4068193#.X_v0JVN_kZQ. [31] HUANG G,LIU Z,MAATEN L V D,et al.Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4700-4708. [32] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Identity mappings in deep residual networks[C]//Proceedings of 2016 European Conference on Computer Vision.Berlin,Germany:Springer,2016:630-645. |