作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (6): 102-115. doi: 10.19678/j.issn.1000-3428.0070739

• 人工智能与模式识别 • 上一篇    下一篇

基于角色学习的多智能体强化学习方法

沈思彤, 王耀吾, 谢在鹏*(), 唐斌   

  1. 河海大学计算机与软件学院,江苏 南京 211100
  • 收稿日期:2024-12-23 出版日期:2025-06-15 发布日期:2025-04-01
  • 通讯作者: 谢在鹏
  • 基金资助:
    水灾害防御全国重点实验室“一带一路”水与可持续发展科技基金(2021490811); 国家自然科学基金青年科学基金项目(62102131)

Role Learning-based Multi-Agent Reinforcement Learning Methods

SHEN Sitong, WANG Yaowu, XIE Zaipeng*(), TANG Bin   

  1. College of Computer and Software, Hohai University, Nanjing 211100, Jiangsu, China
  • Received:2024-12-23 Online:2025-06-15 Published:2025-04-01
  • Contact: XIE Zaipeng

摘要:

多智能体强化学习(MARL)在解决复杂协作任务中具有重要作用。然而,传统方法在动态环境和信息非平稳性方面存在显著局限性。针对这些挑战,提出一种基于角色学习的多智能体强化学习框架(RoMAC)。该框架通过基于动作属性的角色划分,并借助角色分配网络实现智能体角色的动态分配,以提升多智能体协作效率。框架采用分层通信设计,包括基于注意力机制的角色间通信和基于互信息的智能体间通信。在角色间通信中,利用注意力机制生成高效的通信信息,以实现角色代理间的协调;在智能体间通信中,通过互信息生成有针对性的信息,从而提升角色组内部的决策质量。实验在星际争霸多智能体挑战(SMAC)环境中进行,结果表明,RoMAC胜率平均提高了约8.62百分点,收敛时间缩短了0.92×106时间步,通信负载平均降低了28.18百分点。消融实验进一步验证了RoMAC各模块在提升性能中的关键作用,体现了模型的稳健性与灵活性。综合实验结果表明,RoMAC在MARL和协作任务中具有显著优势,为复杂任务的高效解决提供了可靠支持。

关键词: 多智能体强化学习, 角色学习, 多智能体通信, 互信息, 协作

Abstract:

Multi-Agent Reinforcement Learning (MARL) plays a crucial role in solving complex cooperative tasks. However, traditional methods face significant limitations in dynamic environments and information nonstationarity. To address these challenges, this paper proposes a Role learning-based Multi-Agent reinforcement learning framework (RoMAC). The framework employs role division based on action attributes and uses a role assignment network to dynamically allocate roles to agents, thereby enhancing the efficiency of multiagent collaboration. The framework adopts a hierarchical communication design, including inter-role communication based on attention mechanisms and inter-agent communication guided by mutual information. In interrole communication, it leverages attention mechanisms to generate efficient communication messages for coordination between role delegates. In inter-agent communication, it uses mutual information to generate targeted information and improve decision-making quality within role groups. Experiments conducted in the StarCraft Multi-Agent Challenge (SMAC) environment show that RoMAC achieves an average win rate improvement of approximately 8.62 percentage points, a reduction in convergence time by 0.92×106 timesteps, and a 28.18 percentage points average decrease in communication load. Ablation studies further validate the critical contributions of each module in enhancing the performance, demonstrating the robustness and flexibility of the model. Overall, the experimental results indicate that RoMAC offers significant advantages in MARL and cooperative tasks, providing reliable support to efficiently address complex challenges.

Key words: Multi-Agent Reinforcement Learning (MARL), role learning, multi-agent communication, mutual information, cooperation