作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (6): 99-106,114. doi: 10.19678/j.issn.1000-3428.0064463

• 人工智能与模式识别 • 上一篇    下一篇

Q学习演化博弈中决策机制对网络合作水平的影响

张尊栋1,2, 王岩楠1, 周慧娟1, 张艺帆3   

  1. 1. 北方工业大学 城市道路交通智能控制技术北京市重点实验室, 北京 100144;
    2. 华盛顿大学 智能城市交通系统实验室, 美国 西雅图 98195;
    3. 北京交通大学 轨道交通控制与安全国家重点实验室, 北京 100044
  • 收稿日期:2022-04-14 修回日期:2022-07-04 发布日期:2022-09-20
  • 作者简介:张尊栋(1979-),男,讲师、博士,主研方向为智能交通;王岩楠,硕士研究生;周慧娟,副教授、博士;张艺帆,博士研究生。
  • 基金资助:
    “十三五”国家重点研发计划(2018YFB1601000)。

The Influence of Decision Mechanisms on Network Cooperation Level in Q-learning Evolutionary Game

ZHANG Zundong1,2, WANG Yannan1, ZHOU Huijuan1, ZHANG Yifan3   

  1. 1. Beijing Key Laboratory of Urban Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China;
    2. Intelligent Urban Transportation Systems Laboratory, University of Washington, Seattle 98195, USA;
    3. State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China
  • Received:2022-04-14 Revised:2022-07-04 Published:2022-09-20

摘要: 针对博弈决策过程中个体无法获取邻居收益的问题,基于Q学习自我经验学习的特性,提出Q学习演化博弈模型。考虑到不同Q学习决策机制会对网络合作水平产生不同的影响,采用ε-greedy决策机制、Boltzmann决策机制和Max-plus决策机制,针对不同的网络类型、不同的博弈模型参数和不同的强化学习参数进行对比实验,量化分析决策机制对网络合作水平的影响。 实验结果表明:与传统的演化博弈模型相比,Q学习演化博弈模型能够普遍提高网络的合作水平,并且不同的Q学习决策机制会对网络合作水平产生不同的影响,使用ε-greedy决策机制的模型合作水平比另两种模型高约35%和37%;较低的学习率、较高的折扣率以及适中的收益均匀性能够促进网络中个体间的合作,使用ε-greedy决策机制的模型合作水平比在较高学习率和较低折扣率下的合作水平分别高约40%和45%;在较高的探索率下,引入考虑个体全局属性的Max-plus决策机制的网络平均收益比引入另两种决策机制的Q学习模型高约22%和17%。

关键词: Q学习, 决策机制, 网络演化博弈, 合作水平, 折扣率

Abstract: Aiming at addressing the problem that individuals face an inability to obtain benefits from their neighbors in the process of game decision making,this study examines the characteristics of self-experiential learning of Q-learning, thereby proposing a Q-learning evolutionary game model. Considering that different Q-learning decision mechanisms have different effects on the cooperation level of the network,the influence of the decision mechanism on the network cooperation level is quantitatively analyzed using three Q-learning decision mechanisms:ε-greedy,Boltzmann,and Max-plus by conducting comparative experiments on different network types,game model parameters,and reinforcement learning parameters. Experiments show that compared with the traditional evolutionary game models,the Q-learning evolutionary game model can generally improve the cooperation level of the network,with different Q-learning decision mechanisms having different effects on the cooperation level of the network. The cooperation level of the model using the ε-greedy decision mechanism is approximately 35% and 37% higher than that of the models using the Boltzmann and Max-plus decision mechanisms,respectively.Lower learning rates,higher discount rates,and moderate benefit uniformity promote cooperation between individuals in the network,such that for the ε-greedy decision mechanism,the cooperation level of the model using lower learning and higher discount rates is about 40% and 45% higher than that of the models using higher learning and lower discount rates,respectively. At the higher exploration level,introducing the Max-plus decision mechanism to consider global attributes of individuals improves the cooperation level by about 22% and 17% compared to using the ε-greedy and Boltzmann decision mechanisms,respectively.

Key words: Q-learning, decision mechanism, network evolutionary game, cooperation level, discount rate

中图分类号: