Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (3): 304-310,320. doi: 10.19678/j.issn.1000-3428.0057309

• Development Research and Engineering Application • Previous Articles     Next Articles

Incomplete Information Game Algorithm Based on Expectimax Search and Double DQN

LEI Jiewei1, WANG Jiayang2, REN Hang1, YAN Tianwei1, HUANG Wei1   

  1. 1. School of Information Engineering, Nanchang University, Nanchang 330031, China;
    2. School of Software Engineering, Jiangxi Agricultural University, Nanchang 330000, China
  • Received:2020-02-01 Revised:2020-03-04 Published:2020-03-05

基于Expectimax搜索与Double DQN的非完备信息博弈算法

雷捷维1, 王嘉旸2, 任航1, 闫天伟1, 黄伟1   

  1. 1. 南昌大学 信息工程学院, 南昌 330031;
    2. 江西农业大学 软件学院, 南昌 330000
  • 作者简介:雷捷维(1994-),男,硕士研究生,主研方向为强化学习、非完备信息博弈算法;王嘉旸(通信作者),博士研究生;任航,硕士研究生;闫天伟,硕士;黄伟,副教授、博士。
  • 基金资助:
    国家自然科学基金(61862043);江西省自然科学基金(20181ACB20006)。

Abstract: As a typical incomplete information game, mahjong is mainly realized by the traditional Expectimax search algorithm, whose pruning strategy and valuation function design based on artificial prior knowledge and thus cause unreasonable assumptions and other problems. This paper proposes an incomplete information game algorithm combining Expectimax search and Double DQN reinforcement learning algorithm. In the process of expanding the Expectimax search tree, the Double DQN output is used to design the estimation function to obtain the branch estimation within the limited number of search layers, and the pruning strategy is designed to sort and expand the card playing actions to realize the pruning of the search tree.In the training process of the Double DQN model, the mahjong information is encoded as feature data to input to neural network to obtain the estimation, and the Expectimax search algorithm is used to obtain the optimal action to improve the exploration strategy. Experimental results show that compared with Expectimax search algorithm, Double DQN algorithm and other supervised learning algorithms, the proposed algorithm has better game performance with a higher winning rate and score in mahjong gam. As a typical incomplete information game, mahjong is mainly realized by the traditional Expectimax search algorithm, whose pruning strategy and valuation function design based on artificial prior knowledge and thus cause unreasonable assumptions and other problems. This paper proposes an incomplete information game algorithm combining Expectimax search and Double DQN reinforcement learning algorithm. In the process of expanding the Expectimax search tree, the Double DQN output is used to design the estimation function to obtain the branch estimation within the limited number of search layers, and the pruning strategy is designed to sort and expand the card playing actions to realize the pruning of the search tree.In the training process of the Double DQN model, the mahjong information is encoded as feature data to input to neural network to obtain the estimation, and the Expectimax search algorithm is used to obtain the optimal action to improve the exploration strategy. Experimental results show that compared with Expectimax search algorithm, Double DQN algorithm and other supervised learning algorithms, the proposed algorithm has better game performance with a higher winning rate and score in mahjong gam.

Key words: Double DQN algorithm, Expectimax search, incomplete information game, mahjong, reinforcement learning

摘要: 麻将作为典型的非完备信息博弈游戏主要通过传统Expectimax搜索算法实现,其剪枝策略与估值函数基于人工先验知识设计,存在假设不合理等问题。提出一种结合Expectimax搜索与Double DQN强化学习算法的非完备信息博弈算法。在Expectimax搜索树扩展过程中,采用Double DQN输出的估值设计估值函数并在限定搜索层数内获得分支估值,同时设计剪枝策略对打牌动作进行排序与部分扩展实现搜索树剪枝。在Double DQN模型训练过程中,将麻将信息编码为特征数据输入神经网络获得估值,使用Expectimax搜索算法得到最优动作以改进探索策略。实验结果表明,与Expectimax搜索算法、Double DQN算法等监督学习算法相比,该算法在麻将游戏上胜率与得分更高,具有更优异的博弈性能。

关键词: Double DQN算法, Expectimax搜索, 非完备信息博弈, 麻将, 强化学习

CLC Number: