基于状态动作预测的多智能体路径规划算法

doi:10.19678/j.issn.1000-3428.0070692

摘要/Abstract

摘要： 智能体深度确定性策略梯度算法(MADDPG)在解决多智能体路径规划问题时，通过引入全局信息缓解了环境非平稳性问题。然而，在复杂环境下，多智能体强化学习算法仍存在奖励稀疏、智能体协作水平低等缺陷。为解决上述问题，提出了一种基于状态动作预测的多智能体路径规划算法(SA-MADDPG)。其中，设计了基于长短期记忆网络的新奇奖励模块，能够在不依赖当前观测和动作的情况下，给予智能体新奇奖励值，以缓解奖励稀疏问题。此外，设计了一个动作预测模块，通过显式地引入协作信息，并提出了一个基于Q值增益的动态权重项，指导智能体权衡自身任务策略优化与协作任务策略优化，以提升智能体协作水平。最终，构建了一个基于无人机的三维多智能体路径规划仿真环境，以综合评估提出算法的性能。实验结果表明SA-MADDPG的平均奖励和平均回合时间：在障碍物密度实验中，分别提高5.26%-15.81%和减少10.96%-16.05%；在智能体数量实验中，提高16.32%-22.9%和减少15.03%-25.15%。

Abstract: Multi-agent Deep Deterministic Policy Gradient Algorithm (MADDPG) alleviates the problem of environmental non-stationarity by introducing global information when solving multi-agent path planning problems. However, in complex environments, multi-agent reinforcement learning algorithms still have shortcomings such as sparse rewards and low levels of agent collaboration. To solve these problems, a multi-agent path planning algorithm based on state action prediction (SA-MADDPG) is proposed. In SA-MADDPG, a Novelty Reward Module based on Long Short-Term Memory network is designed, which can give novel reward values to the agent without relying on current observations and actions to alleviate the problem of reward sparseness. In addition, an Action Prediction Module is designed by explicitly incorporating collaborative information, and a dynamic weight term based on Q-value gain to guide the agents in balancing the optimization of its own task strategy with the optimization of collaborative task strategies, thereby enhancing the level of collaboration among agents. Finally, a three-dimensional multi-agent path planning simulation environment based on drones is constructed to comprehensively evaluate the performance of the proposed algorithm. Experimental results show that the average reward and average episode time of SA-MADDPG: in the obstacle density experiment, they increased by 5.26%-15.81% and decreased by 10.96%-16.05% respectively; in the agent number experiment, they increased by 16.32%-22.9% and decreased by 15.03%-25.15%.

陈凯, 陈志华, 戴蕾. 基于状态动作预测的多智能体路径规划算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0070692.

Kai Chen, Zhihua Chen, Lei Dai. Multi-Agent Path Planning Algorithm Based on State Action Prediction[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0070692.

参考文献

[1] Kong F, Du F, Zhao D. Station-viewpoint joint coverage path planning towards mobile visual inspection[J]. Robotics and Computer-Integrated Manufacturing, 2025, 91: 102821.
[2] Babataev I, Fedoseev A, Weerakkodi N, et al. Hyperguider: Virtual reality framework for interactive path planning of quadruped robot in cluttered and multi-terrain environments[C]// IEEE International Conference on Systems, Man, and Cybernetics. Prague, Czech Republic: IEEE, 2022: 2037-2042.
[3] Huang Y, Tsao C T, Lee H H. Efficiency Improvement to Neural-Network-Driven Optimal Path Planning via Region and Guideline Prediction[J]. IEEE Robotics and Automation Letters, 2024, 9(2): 1851-1858.
[4] Li S, Chen X, Zhang M, et al. A UAV coverage path planning algorithm based on double deep q-network[J]. Journal of Physics: Conference Series, 2022, 2216(1): 12-17.
[5] Cheng X, Zhou J, Zhou Z, et al. An improved RRT-Connect path planning algorithm of robotic arm for automatic sampling of exhaust emission detection in Industry 4.0[J]. Journal of Industrial Information Integration, 2023, 33: 100436.
[6] Huang T, Fan K, Sun W. Density gradient-RRT: An improved rapidly exploring random tree algorithm for UAV path planning[J]. Expert Systems with Applications, 2024, 252: 124121.
[7] Zhang J, Chen D, Han G, et al. Formation Path Planning for Collaborative Autonomous Underwater Vehicles Based on Consensus-Sparrow Search Algorithm[J]. IEEE Internet of Things Journal, 2023, 11(8): 13810-13823.
[8] Li B, Tan C, Lian Y, et al. Mobile robot global planning based on improved a* algorithm path planning research[C]//Proceedings of the 2023 international conference on advances in artificial intelligence and applications. Wuhan, China: ACM, 2023: 305-311.
[9] Li H, Qian L, Hong M, et al. Effective anti-submarine decision support system based on heuristic rank-based Dijkstra andadaptive threshold partitioning mechanism[J]. Applied Soft Computing, 2024, 161: 111718.
[10] Ganesan S, Ramalingam B, Mohan R E. A hybrid sampling-based RRT* path planning algorithm for autonomous mobile robot navigation[J]. Expert Systems with Applications, 2024, 258: 125206.
[11] Ab Wahab M N, Nazir A, Khalil A, et al. Improved genetic algorithm for mobile robot path planning in static environments[J]. Expert Systems with Applications, 2024, 249: 123762.
[12] Wang Y, Hu F, Xu H, et al. A Multi-Groups Cooperative Particle Swarm Algorithm for Optimization of Multi-Vehicle Path Planning in Internet of Vehicles[J]. IEEE Internet of Things Journal, 2024, 11(22): 35839 – 35851.
[13] Yu K, Xu B. Mobile Robot Path Planning Based on Improved Elite Ant Colony Algorithm[C]//International Conference on Robotics, Control and Automation. Shanghai, China: IEEE, 2024: 63-67.
[14] Jiang Z, Li F, Yang R. Multi-objective optimization in mobile robot path planning: a joint strategy of A* and simulated annealing algorithms[C]//IEEE International Conference on Cybernetics and Intelligent Systems and IEEE International Conference on Robotics, Automation and Mechatronics. Hangzhou, China: IEEE, 2024: 162-167.
[15] Lee J, Seo Y. Q-learning based on strategic artificial potential field for path planning enabling concealment and cover in ground battlefield environments[J]. Applied Intelligence, 2024, 54(13-14): 7170-7200.
[16] 罗彪, 胡天萌, 周育豪, 等. 多智能体强化学习控制与决策研究综述 [J/OL]. 自动化学报 , 1-30[2024-12-01]. https://doi.org/10.16383/j.aas.c240392. Biao L, Tianmeng H, Yuhao Z, et al. Survey on multi-agent reinforcement learning for control and decision-making[J/OL]. Acta Automatica Sinica, 1-30[2024-12-01]. https://doi.org/10.16383/j.aas.c240392. (in Chinese)
[17] Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]// Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates, 2017: 6382-6393.
[18] Pathak D, Agrawal P, Efros A A, et al. Curiosity-driven exploration by self-supervised prediction[C]//International Conference on Machine Learning. Sydney, Australia: PMLR, 2017: 2778-2787.
[19] Reizinger P, Szemenyei M. Attention-based curiosity-driven exploration in deep reinforcement learning[C]//IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona, Spain: IEEE, 2020: 3542-3546.
[20] 乔和, 李增辉, 刘春, 等. 基于改进好奇心的深度强化学习方法[J]. 计算机应用研究, 2024, 41(09): 2635-2640. He Q, Zenghui L, Chun L, et al. Research on deep reinforcement learning method based on improved curiosity [J]. Application Research of Computers, 2024, 41 (9): 2635-2640. (in Chinese)
[21] 金志军, 王浩, 方宝富. 稀疏场景下基于理性好奇心的多智能体强化学习[J]. 计算机工程, 2023, 49(05): 302-309. Zhijun J, Hao W, Baofu F. Multi-Agent Reinforcement Learning Based on Rational Curiosity in Sparse Scenarios[J]. Computer Engineering, 2023, 49(5): 302-309. (in Chinese)
[22] Greff K，Srivastava R K，Koutnik J，et al. LSTM：a search space odyssey［J］. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(10)： 2222-2232.
[23] Rashid T, Samvelyan M, De Witt C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[J]. Journal of Machine Learning Research, 2020, 21(178): 1-51.
[24] Wang J, Ren Z, Liu T, et al. QPLEX: Duplex Dueling Multi-Agent Q-Learning[C]//International Conference on Learning Representations. Virtual Event, Austria: : OpenReview, 2021: 1-27.
[25] Peng B, Rashid T, Schroeder de Witt C, et al. Facmac: Factored multi-agent centralised policy gradients[C]// Neural Information Processing Systems. Beijing, China: Curran Associates, 2021, 34: 12208-12221.
[26] Foerster J, Farquhar G, Afouras T, et al. Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAconference on artificial intelligence. Louisiana, USA: AAAI, 2018, 32(1): 2974-2982.
[27] 袁雷, 张子谦, 李立和, 等. 开放环境下的协作多智能体强化学习进展[J]. 中国科学: 信息科学, 2025, 55(02): 217-268. Lei Y, Ziqian Z, Lihe L, et al. Progress on cooperative multi-agent reinforcement learning in open environment[J]. SCIENTIA SINICA Informationis, 2025, 55(02): 217-268. (in Chinese)
[28] 刘奇儒, 耿霞. 基于改进 DQN 算法的机器人路径规划[J]. 计算机工程, 2023, 49(12): 111-120. Qiru L, Xia G. Robot Path Planning Based on Improved DQN Algorithm[J]. Computer Engineering, 2023, 49(12): 111-12. (in Chinese)
[29] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning. Stockholm, Sweden: PMLR, 2018: 1861-1870.
[30] Zheng L, Chen J, Wang J, et al. Episodic multi-agent reinforcement learning with curiosity-driven exploration[J]. Advances in Neural Information Processing Systems, 2021, 34: 3757-3769.
[31] Guan H, Gao Y, Zhao M, et al. Ab-mapper: Attention and bicnet based multi-agent path planning for dynamic environment[C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems. Kyoto, Japan: IEEE, 2022: 13799-13806.
[32] Xu F, Kaneko T. Curiosity-driven Exploration for Cooperative Multi-Agent Reinforcement Learning[C]// International Joint Conference on Neural Networks. Coast, Australia: IEEE, 2023: 1-8.
[33] Zhang Z, Duan T, Sun Z, et al. Prediction-based Hierarchical Reinforcement Learning for Robot Soccer[C]//2024 IEEE/CIC International Conference on Communications in China. Hangzhou, China: IEEE, 2024: 1721-1726.
[34] Zhang S, Cao J, Yuan L, et al. Self-Motivated Multi-Agent Exploration[C]// 2023 International Conference on Autonomous Agents and Multiagent Systems. London, United Kingdom: Springer, 2023: 476-484.
[35] 方城亮, 杨飞生, 潘泉. 基于 MASAC 强化学习算法的多无人机协同路径规划[J]. 中国科学: 信息科学, 2024, 54(08): 1871-1883. Chengliang F, Feisheng Y, Quan P. Multi-UAV collaborative path planning based on multi-agent soft actor critic[J]. SCIENTIA SINICA Informationis, 2024, 54(08): 1871-1883. (in Chinese)
[36] Paszke A. Pytorch: An imperative style, high-performance deep learning library[C]//Neural Information Processing Systems. Vancouver, BC, Canada: Curran Associate, 2019: 8024-8035
[37] Hasselt H V. Double q-learning[C]//Neural Information Processing Systems. Vancouver, BC, Canada: Curran Associates, 2010: 2613–2621.
[38] Musa S. Techniques for quadcopter modeling and design: A review[J]. Journal of unmanned system Technology, 2018, 5(3): 66-75.
[39] 石喜玲, 孙运强, 李静, 等. 四旋翼动力学建模及非线性 PID 轨迹跟踪控制[J]. 科学技术与工程, 2020, 20(06): 2489-2493. Xiling S, Yunqiang S, Jing L, et al. Quadrotor Dynamics Modeling and Nonlinear PID Trajectory Tracking Control[J]. Science Technology and Engineering, 2020, 20(06): 2489-2493. (in Chinese)
[40] Yu C, Velu A, Vinitsky E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[J]. Advances in neural information processing systems, 2022, 35: 24611-24624.
[41] Raffin A, Hill A, Gleave A, et al. Stable-Baselines3: Reliable Reinforcement Learning Implementations[J]. Journal of Machine Learning Research, 2021, 22(268): 1-8

选择文件类型/文献管理软件名称

选择包含的内容