作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于状态动作预测的多智能体路径规划算法

  • 发布日期:2025-05-22

Multi-Agent Path Planning Algorithm Based on State Action Prediction

  • Published:2025-05-22

摘要: 智能体深度确定性策略梯度算法(MADDPG)在解决多智能体路径规划问题时,通过引入全局信息缓解了环境非平稳性问题。然而,在复杂环境下,多智能体强化学习算法仍存在奖励稀疏、智能体协作水平低等缺陷。为解决上述问题,提出了一种基于状态动作预测的多智能体路径规划算法(SA-MADDPG)。其中,设计了基于长短期记忆网络的新奇奖励模块,能够在不依赖当前观测和动作的情况下,给予智能体新奇奖励值,以缓解奖励稀疏问题。此外,设计了一个动作预测模块,通过显式地引入协作信息,并提出了一个基于Q值增益的动态权重项,指导智能体权衡自身任务策略优化与协作任务策略优化,以提升智能体协作水平。最终,构建了一个基于无人机的三维多智能体路径规划仿真环境,以综合评估提出算法的性能。实验结果表明SA-MADDPG的平均奖励和平均回合时间:在障碍物密度实验中,分别提高5.26%-15.81%和减少10.96%-16.05%;在智能体数量实验中,提高16.32%-22.9%和减少15.03%-25.15%。

Abstract: Multi-agent Deep Deterministic Policy Gradient Algorithm (MADDPG) alleviates the problem of environmental non-stationarity by introducing global information when solving multi-agent path planning problems. However, in complex environments, multi-agent reinforcement learning algorithms still have shortcomings such as sparse rewards and low levels of agent collaboration. To solve these problems, a multi-agent path planning algorithm based on state action prediction (SA-MADDPG) is proposed. In SA-MADDPG, a Novelty Reward Module based on Long Short-Term Memory network is designed, which can give novel reward values to the agent without relying on current observations and actions to alleviate the problem of reward sparseness. In addition, an Action Prediction Module is designed by explicitly incorporating collaborative information, and a dynamic weight term based on Q-value gain to guide the agents in balancing the optimization of its own task strategy with the optimization of collaborative task strategies, thereby enhancing the level of collaboration among agents. Finally, a three-dimensional multi-agent path planning simulation environment based on drones is constructed to comprehensively evaluate the performance of the proposed algorithm. Experimental results show that the average reward and average episode time of SA-MADDPG: in the obstacle density experiment, they increased by 5.26%-15.81% and decreased by 10.96%-16.05% respectively; in the agent number experiment, they increased by 16.32%-22.9% and decreased by 15.03%-25.15%.