作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于交叉注意力扩散模型的对手建模研究

  • 发布日期:2025-04-10

Opponent Modeling Based on Cross-Attention Diffusion Model

  • Published:2025-04-10

摘要: 对手建模作为多智能体博弈对抗的关键技术,其目的为学习对手的行为以减少环境的不确定性并帮助决策。而现有的对手建模方法大多采用离线训练加在线适应的结构,在离线训练中采用传统神经动力学模型对智能体进行一步一步地预测,容易形成单步误差进而形成累计误差,且在在线适应中面对未知对手时,亦会导致受控智能体计划状态偏离数据集分布。为解决上述问题,提出基于扩散模型并利用交叉注意力和对手建立关联的框架,其利用扩散模型可以同时生成多步规划序列这一特点解决了累计偏差问题。提出了策略集的概念,通过在线微调的方式不仅解决了计划偏离问题也解决了在线训练初始阶段会破坏离线策略的问题。在开放的密集奖励和稀疏奖励的竞争环境中的实验结果都充分证明了这一方法卓越的性能。

Abstract: As a key technology of multi-agent game confrontation, opponent modeling aims to learn the behavior of the opponent to reduce the uncertainty of the environment and help decision-making. However, most of the existing opponent modeling methods adopt the structure of offline training and online adaptation. In offline training, the traditional neural dynamics model is used to predict the agent step by step, which is easy to form a single step error and then a cumulative error. In addition, when facing an unknown opponent in online adaptation, the planned state of the controlled agent will deviate from the distribution of the data set. In order to solve the above problems, a framework based on diffusion model and cross-attention is proposed to establish correlation with opponent. The cumulative bias problem is solved by using the feature that the diffusion model can generate multi-step planning sequence at the same time. The concept of strategy set is proposed, and the deviation problem is not only solved by online fine-tuning, but also the problem that the offline strategy will be destroyed in the initial stage of online training. Experimental results in both open intensive reward and sparse reward competitive environments fully demonstrate the superior performance of this method.