基于角色学习的多智能体强化学习方法

doi:10.19678/j.issn.1000-3428.0070739

摘要/Abstract

摘要：

多智能体强化学习(MARL)在解决复杂协作任务中具有重要作用。然而，传统方法在动态环境和信息非平稳性方面存在显著局限性。针对这些挑战，提出一种基于角色学习的多智能体强化学习框架(RoMAC)。该框架通过基于动作属性的角色划分，并借助角色分配网络实现智能体角色的动态分配，以提升多智能体协作效率。框架采用分层通信设计，包括基于注意力机制的角色间通信和基于互信息的智能体间通信。在角色间通信中，利用注意力机制生成高效的通信信息，以实现角色代理间的协调；在智能体间通信中，通过互信息生成有针对性的信息，从而提升角色组内部的决策质量。实验在星际争霸多智能体挑战(SMAC)环境中进行，结果表明，RoMAC胜率平均提高了约8.62百分点，收敛时间缩短了0.92×10⁶时间步，通信负载平均降低了28.18百分点。消融实验进一步验证了RoMAC各模块在提升性能中的关键作用，体现了模型的稳健性与灵活性。综合实验结果表明，RoMAC在MARL和协作任务中具有显著优势，为复杂任务的高效解决提供了可靠支持。

关键词: 多智能体强化学习, 角色学习, 多智能体通信, 互信息, 协作

Abstract:

Multi-Agent Reinforcement Learning (MARL) plays a crucial role in solving complex cooperative tasks. However, traditional methods face significant limitations in dynamic environments and information nonstationarity. To address these challenges, this paper proposes a Role learning-based Multi-Agent reinforcement learning framework (RoMAC). The framework employs role division based on action attributes and uses a role assignment network to dynamically allocate roles to agents, thereby enhancing the efficiency of multiagent collaboration. The framework adopts a hierarchical communication design, including inter-role communication based on attention mechanisms and inter-agent communication guided by mutual information. In interrole communication, it leverages attention mechanisms to generate efficient communication messages for coordination between role delegates. In inter-agent communication, it uses mutual information to generate targeted information and improve decision-making quality within role groups. Experiments conducted in the StarCraft Multi-Agent Challenge (SMAC) environment show that RoMAC achieves an average win rate improvement of approximately 8.62 percentage points, a reduction in convergence time by 0.92×10⁶ timesteps, and a 28.18 percentage points average decrease in communication load. Ablation studies further validate the critical contributions of each module in enhancing the performance, demonstrating the robustness and flexibility of the model. Overall, the experimental results indicate that RoMAC offers significant advantages in MARL and cooperative tasks, providing reliable support to efficiently address complex challenges.

Key words: Multi-Agent Reinforcement Learning (MARL), role learning, multi-agent communication, mutual information, cooperation

沈思彤, 王耀吾, 谢在鹏, 唐斌. 基于角色学习的多智能体强化学习方法[J]. 计算机工程, 2025, 51(6): 102-115.

SHEN Sitong, WANG Yaowu, XIE Zaipeng, TANG Bin. Role Learning-based Multi-Agent Reinforcement Learning Methods[J]. Computer Engineering, 2025, 51(6): 102-115.

https://www.ecice06.com/CN/Y2025/V51/I6/102

图/表 10

图1 角色划分及角色分配流程

Fig.1 Role division and assignment process

图2 RoMAC算法流程

Fig.2 RoMAC algorithm process

图3 RoMAC算法在多场景下的性能评估与对比分析

Fig.3 Performance evaluation and comparative analysis of the RoMAC algorithm across multiple scenarios

图4 RoMAC与不同混合网络结合的性能

Fig.4 Performance of RoMAC with different mixing networks

参考文献 26

1	ZHU C X , DASTANI M , WANG S H . A survey of multi-agent deep reinforcement learning with communication. Autonomous Agents and Multi-Agent Systems, 2024, 38 (1): 4. doi: 10.1007/s10458-023-09633-6
2	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. [2024-12-14]. https://arxiv.org/abs/1706.02275v4.
3	TABISH R , MIKAYEL S , SCHROEDER D W C , et al. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research, 2020, 21 (178): 1- 51. doi: 10.48550/arXiv.1803.11485
4	王涵, 俞扬, 姜远. 基于通信的多智能体强化学习进展综述. 中国科学: 信息科学, 2022, 52 (5): 742- 764. URL
	WANG H , YU Y , JIANG Y . Review of the progress of communication-based multi-agent reinforcement learning. Scientia Sinica (Informationis), 2022, 52 (5): 742- 764. URL
5	罗彪, 胡天萌, 周育豪, 等. 多智能体强化学习控制与决策研究综述. 自动化学报, 2025, 51 (3): 510- 539. doi: 10.16383/j.aas.c240392
	LUO B , HU T M , ZHOU Y H , et al. Survey on multi-agent reinforcement learning for control and decision-making. Acta Automatica Sinica, 2025, 51 (3): 510- 539. doi: 10.16383/j.aas.c240392
6	丁世飞, 杜威, 张健, 等. 多智能体深度强化学习研究进展. 计算机学报, 2024, 47 (7): 1547- 1567. doi: 10.11897/SP.J.1016.2024.01547
	DING S F , DU W , ZHANG J , et al. Research progress of multi-agent deep reinforcement learning. Chinese Journal of Computers, 2024, 47 (7): 1547- 1567. doi: 10.11897/SP.J.1016.2024.01547
7	MAO H Y , ZHANG Z C , XIAO Z , et al. Learning agent communication under limited bandwidth by message pruning. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (4): 5142- 5149. doi: 10.1609/aaai.v34i04.5957
8	DING Z , HUANG T , LU Z . Learning individually inferred communication for multi-agent cooperation. Advances in Neural Information Processing Systems, 2020, 33, 22069- 22079. doi: 10.48550/arXiv.2006.06455
9	WANG Y, ZHONG F, XU J, et al. ToM2C: target-oriented multi-agent communication and cooperation with theory of mind[C]//Proceedings of the 10th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2022: 1-10.
10	HU S C, SHEN L, ZHANG Y, et al. Learning multi-agent communication from graph modeling perspective[C]//Proceedings of the 12th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2024: 1-10.
11	WANG X, LI X, SHAO J, et al. AC2C: adaptively controlled two-hop communication for multi-agent reinforcement learning[C]//Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. Washington D. C., USA: IEEE Press, 2023: 427-435.
12	ZHANG S , LIN J Y , ZHANG Q . Succinct and robust multi-agent communication with temporal message control. Advances in Neural Information Processing Systems, 2020, 33, 17271- 17282.
13	GUAN C , CHEN F , YUAN L , et al. Efficient multi-agent communication via self-supervised information aggregation. Advances in Neural Information Processing Systems, 2022, 35, 1020- 1033.
14	KIM D, MOON S C, HOSTALLERO D, et al. Learning to schedule communication in multi-agent reinforcement learning[C]//Proceedings of the 7th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2019: 1-10.
15	YUAN L , WANG J H , ZHANG F X , et al. Multi-agent incentive communication via decentralized teammate modeling. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36 (9): 9466- 9474. doi: 10.1609/aaai.v36i9.21179
16	GUO X D, SHI D M, FAN W H. Scalable communication for multi-agent reinforcement learning via transformer-based email mechanism[C]//Proceedings of the 32nd International Joint Conference on Artificial Intelligence. Macao, China: [s. n.], 2023: 126-134.
17	LIU Z, WAN L, SUI X, et al. Deep hierarchical communication graph in multi-agent reinforcement learning[C]//Proceedings of the 32nd International Joint Conference on Artificial Intelligence. Macao, China: [s. n.], 2023: 208-216.
18	PINA R, SILVA V, ARTAUD C, et al. Efficient role-based communication for multi-agent systems[C]//Proceedings of the Autonomous Agents and Multi-Agent Systems. Washington D. C., USA: IEEE Press, 2024: 1-10.
19	DUAN W, LU J, XUAN J Y. Group-aware coordination graph for multi-agent reinforcement learning[C]// Proceedings of the 33rd International Joint Conference on Artificial Intelligence. Jeju Island, Republic of Korea: [s. n.], 2024: 3926-3934.
20	WANG T H, DONG H, LESSER V, et al. ROMA: multi-agent reinforcement learning with emergent roles[C]//Proceedings of the 37th International Conference on Machine Learning. New York, USA: ACM Press, 2020, 1-10.
21	WANG T, GUPTA T, MAHAJAN A, et al. RODE: learning roles to decompose multi-agent tasks[C]//Proceedings of the 9th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2021: 1-20.
22	YANG M , ZHAO J , HU X , et al. LDSA: learning dynamic subtask assignment in cooperative multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 2022, 35, 1698- 1710. doi: 10.48550/arXiv.2205.02561
23	YANG M , ZHAO K Y , WANG Y M , et al. Team-wise effective communication in multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 2024, 38 (2): 36. doi: 10.1007/s10458-024-09665-6
24	ALEMI A A, FISCHER I, DILLON J V, et al. Deep variational information bottleneck[C]//Proceedings of the 5th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2017: 1-10.
25	SAMVELYAN M, RASHID T, SCHRÖDER D W C, et al. The StarCraft multi-agent challenge[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems. Washington D. C., USA: IEEE Press, 2019: 2186-2188.
26	ZHANG S , ZHANG Q , LIN J Y . Efficient communication in multi-agent reinforcement learning via variance based control. Advances in Neural Information Processing Systems, 2019, 32, 1- 10. URL

[1]	王克文, 张维庭, 孙童. 空天地一体化算力网络资源调度机制[J]. 计算机工程, 2025, 51(5): 52-61.
[2]	曾建州, 李泽平, 张素勤. 基于TD3算法的多智能体协作缓存策略[J]. 计算机工程, 2025, 51(2): 365-374.
[3]	郭伟, 王欣哲, 王江达, 王春艳. 基于卷积调制与空间协作的水下图像增强[J]. 计算机工程, 2024, 50(8): 310-318.
[4]	卢晓天, 朴春慧, 杨兴雨, 白英杰. 基于贝叶斯网络的差分隐私高维数据发布技术研究[J]. 计算机工程, 2024, 50(5): 167-181.
[5]	毕千, 钱程, 张可, 王成. 基于深度强化学习的多智能体角度跟踪方法设计[J]. 计算机工程, 2024, 50(11): 10-17.
[6]	张俊娜, 韩超臣, 陈家伟, 赵晓焱, 袁培燕. 一种联合边缘服务器部署与服务放置的方法[J]. 计算机工程, 2024, 50(10): 266-280.
[7]	郑美光, 杨泳. 基于互信息软聚类的个性化联邦学习算法[J]. 计算机工程, 2023, 49(8): 20-28.
[8]	杨璇, 马建敏, 赵曼君. 基于邻域互信息的高维时序数据特征选择[J]. 计算机工程, 2023, 49(7): 135-142.
[9]	石小容, 李爱萍, 牛保宁, 段利国, 赵菊敏. 兼顾个性化需求的云边协作两级内容缓存研究[J]. 计算机工程, 2023, 49(5): 223-230.
[10]	程小辉, 李钰, 康燕萍. 基于中间图特征提取的卷积网络双标准剪枝[J]. 计算机工程, 2023, 49(3): 105-112.
[11]	厉子凡, 王浩, 方宝富. 一种基于多步竞争网络的多智能体协作方法[J]. 计算机工程, 2022, 48(5): 74-81.
[12]	徐鑫, 温蜜. 基于数据库驱动认知无线电网络的位置隐私保护方案[J]. 计算机工程, 2022, 48(2): 164-172.
[13]	于晶, 鲁凌云, 李翔. 车联网中基于DDQN的边云协作任务卸载机制[J]. 计算机工程, 2022, 48(12): 156-164.
[14]	刘金石, Manzoor Ahmed, 林青. 基于QMix的车辆云计算资源动态分配方法[J]. 计算机工程, 2022, 48(11): 284-290,298.
[15]	赵宏, 郭岚, 陈志文, 郑厚泽. 基于多模态融合与多层注意力的视频内容文本表述研究[J]. 计算机工程, 2022, 48(10): 45-54.

选择文件类型/文献管理软件名称

选择包含的内容