Incomplete Information Game Algorithm Based on Expectimax Search and Double DQN

doi:10.19678/j.issn.1000-3428.0057309

Abstract

Abstract: As a typical incomplete information game, mahjong is mainly realized by the traditional Expectimax search algorithm, whose pruning strategy and valuation function design based on artificial prior knowledge and thus cause unreasonable assumptions and other problems. This paper proposes an incomplete information game algorithm combining Expectimax search and Double DQN reinforcement learning algorithm. In the process of expanding the Expectimax search tree, the Double DQN output is used to design the estimation function to obtain the branch estimation within the limited number of search layers, and the pruning strategy is designed to sort and expand the card playing actions to realize the pruning of the search tree.In the training process of the Double DQN model, the mahjong information is encoded as feature data to input to neural network to obtain the estimation, and the Expectimax search algorithm is used to obtain the optimal action to improve the exploration strategy. Experimental results show that compared with Expectimax search algorithm, Double DQN algorithm and other supervised learning algorithms, the proposed algorithm has better game performance with a higher winning rate and score in mahjong gam. As a typical incomplete information game, mahjong is mainly realized by the traditional Expectimax search algorithm, whose pruning strategy and valuation function design based on artificial prior knowledge and thus cause unreasonable assumptions and other problems. This paper proposes an incomplete information game algorithm combining Expectimax search and Double DQN reinforcement learning algorithm. In the process of expanding the Expectimax search tree, the Double DQN output is used to design the estimation function to obtain the branch estimation within the limited number of search layers, and the pruning strategy is designed to sort and expand the card playing actions to realize the pruning of the search tree.In the training process of the Double DQN model, the mahjong information is encoded as feature data to input to neural network to obtain the estimation, and the Expectimax search algorithm is used to obtain the optimal action to improve the exploration strategy. Experimental results show that compared with Expectimax search algorithm, Double DQN algorithm and other supervised learning algorithms, the proposed algorithm has better game performance with a higher winning rate and score in mahjong gam.

Key words: Double DQN algorithm, Expectimax search, incomplete information game, mahjong, reinforcement learning

摘要： 麻将作为典型的非完备信息博弈游戏主要通过传统Expectimax搜索算法实现，其剪枝策略与估值函数基于人工先验知识设计，存在假设不合理等问题。提出一种结合Expectimax搜索与Double DQN强化学习算法的非完备信息博弈算法。在Expectimax搜索树扩展过程中，采用Double DQN输出的估值设计估值函数并在限定搜索层数内获得分支估值，同时设计剪枝策略对打牌动作进行排序与部分扩展实现搜索树剪枝。在Double DQN模型训练过程中，将麻将信息编码为特征数据输入神经网络获得估值，使用Expectimax搜索算法得到最优动作以改进探索策略。实验结果表明，与Expectimax搜索算法、Double DQN算法等监督学习算法相比，该算法在麻将游戏上胜率与得分更高，具有更优异的博弈性能。

关键词: Double DQN算法, Expectimax搜索, 非完备信息博弈, 麻将, 强化学习

CLC Number:

TP183

LEI Jiewei, WANG Jiayang, REN Hang, YAN Tianwei, HUANG Wei. Incomplete Information Game Algorithm Based on Expectimax Search and Double DQN[J]. Computer Engineering, 2021, 47(3): 304-310,320.

雷捷维, 王嘉旸, 任航, 闫天伟, 黄伟. 基于Expectimax搜索与Double DQN的非完备信息博弈算法[J]. 计算机工程, 2021, 47(3): 304-310,320.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0057309

http://www.ecice06.com/EN/Y2021/V47/I3/304

Figures/Tables 12

References

[1] URIELI D,STONE P.TacTex'13:a champion adaptive power trading agent[C]//Proceedings of the 28th Conference on Artificial Intelligence.Washington D.C.,USA:AAAI Press,2014:465-471.
[2] BAZZAN A.Opportunities for multi-agent systems and multi-agent reinforcement learning in traffic control[J].Autonomous Agents and Multi-Agent Systems,2009,18(3):342-375.
[3] DURKOTA K,LISY V,BOSANSKY B,et al.Approximate solutions for attack graph games with imperfect information[C]//Proceedings of 2015 International Conference on Decision and Game Theory for Security.Berlin,Germany:Springer,2015:228-249.
[4] SHANG Tongfei,WU Jinyun,MA Jianfeng.Research on performance evaluation of wargame system based on deep reinforcement learning[J].Journal of Physics,2019,1302(3):25-37.
[5] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359.
[6] SILVER D,HUANG A,MADDISON C,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[7] SILVER D,HUBERT T,SCHRITTWIESER J,et al.Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].[2019-12-15].https://www.researchgate.net/publication/321571298_Mastering_Chess_and_Shogi_by_Self-Play_with_a_General_Reinforcement_Learning_Algorithm.
[8] BOWLING M,BURCH N,JOHANSON M,et al.Heads-up limit hold'em poker is solved[J].Science,2015,347(6218):145-149.
[9] MORACIK M,SCHMID M,BURCH N,et al.Deepstack:expert level artificial intelligence in heads-up no-limit poker[J].Science,2017,356(6337):508-513.
[10] VINYALS O,EWALDS T,BARTUNOV S,et al.Starcraft Ⅱ:a new challenge for reinforcement learning[EB/OL].[2019-12-15].https://www.researchgate.net/publication/319151530_StarCraft_Ⅱ_A_New_Challenge_for_Reinfo-rcement_Learning.
[11] BANSAL T,PACHOCKI J,SIDOR S,et al.Emergent com-plexity via multi-agent competition[EB/OL].[2019-12-15].https://www.researchgate.net/publication/320322134_Emergent_Complexity_via_Multi-Agent_Competition.
[12] DRACHEN A,YANCEY M,MAGUIRE J,et al.Skill-based differences in spatio-temporal team behaviour in defence of the ancients 2[C]//Proceedings of 2014 IEEE Games,Entertainment,and Media Conference.Washington D.C.,USA:IEEE Press,2014:1-8.
[13] MSRA.Which game is more difficult for AI?Use math to analyze[EB/OL].[2019-12-15].https://zhuanlan.zhihu.com/p/78321765.(in Chinese)微软亚洲研究院.哪类游戏AI难度更高?用数学来分析一下[EB/OL].[2019-12-15].https://zhuanlan.zhihu.com/p/78321765.
[14] MELKO E,NAGY B.Optimal strategy in games with chance nodes[J].Acta Cybernetica,2007,18(2):171-192.
[15] Ballard B W.Minimax search procedure for trees containing chance nodes[J].Artificial Intelligence,1983,21(3):327-350.
[16] LIN Dianyu.The study of mahjong artificial intelligence[D].Hsinchu,China:National Chiao Tung University,2008.(in Chinese)林典余.麻将之人工智慧研究[D].中国台湾新竹:台湾国立交通大学,2008.
[17] ZHUANG Likai.Research on mahjong artificial intelligence[D].Hsinchu,China:National Chiao Tung University,2015.(in Chinese)荘立楷.麻将人工智慧之研究[D].中国台湾新竹:台湾国立交通大学,2015.
[18] SELTEN R.Bounded rationality[J].Journal of Institutional and Theoretical Economics,1990,146(4):649-658.
[19] ARIELY D.Predictably irrational[M].New York,USA:Harper Collins,2008.
[20] HASSELT H V,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning[EB/OL].[2019-12-15].https://www.researchgate.net/publication/282182152_Deep_Reinforcement_Learning_with_Double_Q-learning.
[21] CAMPBELL M,MARSLAND T.A comparison of minimax tree search algorithms[J].Artificial Intelligence,1983,20(4):347-367.
[22] BELLMAN R.Dynamic programming[J].Science,1966,153(3731):34-37.
[23] SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine learning,1988,4(3):9-44.
[24] METROPOLIS N,ULAM S.The monte carlo method[J].Journal of the American Statistical Association,1949,44(247):335-341.
[25] WATKINS C,DAYAN P.Q-learning[J].Machine Learning,1992,8(3):279-292.
[26] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[EB/OL].[2019-12-15].https://www.oalib.com/paper/4042798#.X_vxnFN_kZQ.
[27] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[28] HE Keming,ZHANG Xiaoyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[29] CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297.
[30] KINGMA D,BA J.Adam:a method for stochastic optimization[EB/OL].[2019-12-15].https://www.oalib.com/paper/4068193#.X_v0JVN_kZQ.
[31] HUANG G,LIU Z,MAATEN L V D,et al.Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4700-4708.
[32] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Identity mappings in deep residual networks[C]//Proceedings of 2016 European Conference on Computer Vision.Berlin,Germany:Springer,2016:630-645.

Please choose a citation manager

Content to export