基于多智能体协同强化学习的多目标追踪方法

doi:10.19678/j.issn.1000-3428.0055904

计算机工程 ›› 2020, Vol. 46 ›› Issue (11): 90-96. doi: 10.19678/j.issn.1000-3428.0055904

基于多智能体协同强化学习的多目标追踪方法

王毅然¹, 经小川^1,2, 贾福凯², 孙宇健², 佟轶²

1. 中国航天系统科学与工程研究院, 北京 100048;
2. 航天宏康智能科技(北京)有限公司, 北京 100048

收稿日期:2019-09-03 修回日期:2019-11-11 发布日期:2020-11-10
作者简介:王毅然(1994-),男,硕士研究生,主研方向为目标跟踪、多智能体;经小川,研究员;贾福凯、孙宇健、佟轶,工程师。
基金资助:
广东省应用型科技研发基金（2016B010127005）。

Multi-Target Tracking Method Based on Multi-Agent Collaborative Reinforcement Learning

WANG Yiran¹, JING Xiaochuan^1,2, JIA Fukai², SUN Yujian², TONG Yi²

1. China Aerospace Academy of Systems Science and Engineering, Beijing 100048, China;
2. Aerospace Hongkang Intelligent Technology(Beijing) Co., Ltd., Beijing 100048, China

Received:2019-09-03 Revised:2019-11-11 Published:2020-11-10

摘要/Abstract

摘要： 针对现有多目标追踪方法通常存在学习速度慢、追踪效率低及协同追踪策略设计困难等问题，提出一种改进的多目标追踪方法。基于追踪智能体和目标智能体数量及其环境信息建立任务分配模型，运用匈牙利算法根据距离效益矩阵对其进行求解得到多个追踪智能体的任务分配情况，并以缩短目标智能体的追踪路径为优化目标进行任务分工，同时利用多智能体协同强化学习算法使多个智能体在相同环境中不断重复执行探索-积累-学习-决策过程，最终根据经验数据更新策略完成多目标追踪任务。仿真结果表明，与DDPG和MADDPG方法相比，该方法能在避免碰撞和躲避障碍物的情况下，使多个智能体通过相互协作形成针对多个运动目标的最短追踪路线。

关键词: 多智能体, 多目标追踪, 强化学习, 任务分配, 实时性

Abstract: There are multiple problems with existing multi-target tracking methods,including low learning speed,inefficient tracking and high difficulty in collaborative tracking strategy design.To this end,this paper proposes an improved multi-target tracking method.The method builds a task assignment model based on the number of target agents and tracking agents and their environmental information.Then the model is solved by using Hungary algorithm according to the distance benefit matrix to acquire the task assignment information of multiple tracking agents,which is optimized to shorten the tracking paths of target agents.In addition,the multi-agent collaborative reinforcement learning algorithm is used to enable multiple agents to repeat the process of exploration-accumulation-learning-decision in the same environment and update the strategy based on empirical data to finally complete the multi-target tracking task.Simulation results show that compared with DDPG and MADDPG methods,the proposed method enables multiple agents to collaboratively form the shortest path for tracking multiple moving targets with collisions and obstacles avoided.

Key words: multi-agent, multi-target tracking, reinforcement learning, task assignment, real-time

中图分类号:

TP391.1

王毅然, 经小川, 贾福凯, 孙宇健, 佟轶. 基于多智能体协同强化学习的多目标追踪方法[J]. 计算机工程, 2020, 46(11): 90-96.

WANG Yiran, JING Xiaochuan, JIA Fukai, SUN Yujian, TONG Yi. Multi-Target Tracking Method Based on Multi-Agent Collaborative Reinforcement Learning[J]. Computer Engineering, 2020, 46(11): 90-96.

https://www.ecice06.com/CN/Y2020/V46/I11/90

图/表 12

20201124085056

20201124085059

20201124085102

20201124085105

20201124085109

20201124085112

20201124085116

20201124085118

20201124085121

20201124085124

20201124085127

20201124085131

参考文献

[1] LAMINI C,FATHI Y,BENHLIMA S.Collaborative Q-learning path planning for autonomous robots based on holonic multi-agent system[C]//Proceedings of the 10th International Conference on Intelligent Systems:Theories and Applications.Washington D.C.,USA:IEEE Press,2015:1-6.
[2] HAJDUK M,SUKOP M,HAUN M.Agent approach to multi-agent systems[M]//HAJDUK M,SUKOP M,HAUN M.Studies in systems,decision and control.Berlin,Germany:Springer,2018:21-22.
[3] HAN Xiangmin,BAO Hong,LIANG Jun,et al.An adaptive cruise control algorithm based on deep reinforcement learning[J].Computing Engineering,2018,44(7):32-35.(in Chinese)韩向敏,鲍泓,梁军,等.一种基于深度强化学习的自适应巡航控制算法[J].计算机工程,2018,44(7):32-35.
[4] YU H L,MEIER K,ARGYLE M,et al.Cooperative path planning for target tracking in urban environments using unmanned air and ground vehicles[J].IEEE/ASME Transactions on Mechatronics,2015,20(2):541-552.
[5] YANG P,TANG K,LOZANO J A,et al.Path planning for single unmanned aerial vehicle by separately evolving waypoints[J].IEEE Transactions on Robotics,2015,31(5):1130-1146.
[6] ZHOU Hailing,KONG Hui,WEI Lei,et al.Efficient road detection and tracking for unmanned aerial vehicle[J].IEEE Transactions on Intelligent Transportation Systems,2015,16(1):297-309.
[7] SHAH K,SCHWAGER M.Multi-agent cooperative pursuit-evasion strategies under uncertainty[M]//CORRELL N,SCHWAGER M,OTTE M.Distributed autonomous robotic systems.Berlin,Germany:Springer,2019:451-468.
[8] GARCIA-FERNANDEZ A F,SVENSSON L.Multiple target tracking based on sets of trajectories[J].IEEE Transactions on Aerospace and Electronic Systems,2020,56(3):1685-1707.
[9] SOUIDI M E H S,SIAM A,PEI Z Y,et al.Multi-agent pursuit-evasion game based on organizational architecture[J].Journal of Computing and Information Technology,2019,27(1):1-11.
[10] DUAN Yong,XU Xinhe.Research on multi-robot cooperation strategy based on multi-agent reinforcement learning[J].Systems Engineering-Theory & Practice,2014,34(5):1305-1310.(in Chinese)段勇,徐心和.基于多智能体强化学习的多机器人协作策略研究[J].系统工程理论与实践,2014,34(5):1305-1310.
[11] GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//Proceedings of International Conference on Autonomous Agents and Multiagent Systems.Berlin,Germany:Springer,2017:66-83.
[12] WEI E,WICKE D,FREELAN D,et al.Multiagent soft Q-learning[C]//Proceedings of 2018 AAAI Spring Symposium Series.Palo Alto,USA:AAAI Press,2018:1-10.
[13] FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[EB/OL].[2019-08-01].https://arxiv.org/abs/1705.08926.
[14] LILLICRAP T,HUNT J,PRITZEL A,et al.Continuous control with deep reinforcement learning[EB/OL].[2019-08-01].https://arxiv.org/abs/1509.02971.
[15] YAN Yalin.Research on multi-robot pursuit-evasion problem based on game theory[D].Harbin:Harbin Engineering University,2014.(in Chinese)晏亚林.基于博弈论的多机器人追捕问题的研究[D].哈尔滨:哈尔滨工程大学,2014.
[16] FANG Baofu,PAN Qishu,HONG Bingrong,et al.Constraint conditions of successful capture in multi-pursuers vs one-evader games[J].Robot,2012,34(3):282-291.(in Chinese)方宝富,潘启树,洪炳镕,等.多追捕者-单-逃跑者追逃问题实现成功捕获的约束条件[J].机器人,2012,34(3):282-291.
[17] ZHANG Xu,LI Ling,JIA Leilei.Research and simulation of multi-robot pursuit and escape strategy based on differential game[J].Equipment Manufacturing Technology,2015(9):9-12.(in Chinese)张旭,李玲,贾磊磊.基于微分博弈的多机器人追逃策略研究及仿真[J].装备制造技术,2015(9):9-12.
[18] DU Wei,DING Shifei.Overview on multi-agent reinforcement learning[J].Computer Science,2019,46(8):1-8.(in Chinese)杜威,丁世飞.多智能体强化学习综述[J].计算机科学,2019,46(8):1-8.
[19] ZHANG Yue.Research on multi-agent deep reinforcement learning methods and applications[D].Xi'an:Xidian University,2018.(in Chinese)张悦.多智能体深度强化学习方法及应用研究[D].西安:西安电子科技大学,2018.
[20] WANG Weixun,HAO Jianye,WANG Yixi,et al.Towards cooperation in sequential prisoner's dilemmas:a deep multiagent reinforcement learning approach[EB/OL].[2019-08-01].https://arxiv.org/abs/1803.00162.
[21] LOWE R,WU Y,TAMAR A,et al.Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of Advances in Neural Information Processing Systems.Berlin,Germany:Springer,2017:6379-6390.

选择文件类型/文献管理软件名称

选择包含的内容

基于多智能体协同强化学习的多目标追踪方法

Multi-Target Tracking Method Based on Multi-Agent Collaborative Reinforcement Learning

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	石琼, 段辉, 师智斌. 基于深度强化学习的可信任务卸载方案[J]. 计算机工程, 2024, 50(8): 142-152.
[2]	钱清, 龙永, 蒋忠远, 段春红, 王宏. 基于深度强化学习的自适应图像隐写算法[J]. 计算机工程, 2024, 50(8): 319-327.
[3]	吴凡, 徐朝农, 邹英豪. 基于PD-NOMA的人员监控图像传输算法[J]. 计算机工程, 2024, 50(6): 266-275.
[4]	高家豪, 胡创业, 丁男, 刘战东. 智能网联汽车中联合驾驶风格的交通流数据有效性分析[J]. 计算机工程, 2024, 50(6): 367-376.
[5]	孙文洁, 李宗民, 孙浩淼. 基于图神经网络的多智能体强化学习值函数分解方法[J]. 计算机工程, 2024, 50(5): 62-70.
[6]	傅明建, 郭福强. 基于深度强化学习的无信号灯路口决策研究[J]. 计算机工程, 2024, 50(5): 91-99.
[7]	张斯力, 李梓健, 蔡瑞初, 郝志峰, 闫玉光. 基于因果机制约束的强化推荐系统[J]. 计算机工程, 2024, 50(5): 279-290.
[8]	张建强, 杨凯军, 欧阳凌丛. 具有规定性能的多智能体动态事件触发编队控制[J]. 计算机工程, 2024, 50(3): 78-88.
[9]	范晓宇, 贾新春, 李彬, 谢云飞. 多率采样机制下多智能体动态事件触发二分一致性研究[J]. 计算机工程, 2024, 50(3): 114-121.
[10]	冯雄波, 黄于欣, 赖华, 高玉梦. 基于多策略强化学习的低资源跨语言摘要方法研究[J]. 计算机工程, 2024, 50(2): 68-77.
[11]	杜海军, 余粟. 基于时空图注意力网络的服务机器人动态避障[J]. 计算机工程, 2024, 50(2): 105-112.
[12]	彭泫滈, 张娟, 李辉, 胡术. 基于改进狼群算法的无人机协同任务规划[J]. 计算机工程, 2024, 50(10): 69-79.
[13]	张俊娜, 韩超臣, 陈家伟, 赵晓焱, 袁培燕. 一种联合边缘服务器部署与服务放置的方法[J]. 计算机工程, 2024, 50(10): 266-280.
[14]	杨佳珠, 余芳, 杨勇生. 考虑能耗的海铁联运集装箱码头多设备协同调度[J]. 计算机工程, 2024, 50(10): 393-404.
[15]	孔月萍, 杨世海, 段梅梅, 丁泽诚, 方凯杰. 基于混合动作强化学习的电动汽车聚合商决策优化算法[J]. 计算机工程, 2024, 50(10): 418-428.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于多智能体协同强化学习的多目标追踪方法

Multi-Target Tracking Method Based on Multi-Agent Collaborative Reinforcement Learning

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献

相关文章 15

编辑推荐

Metrics

本文评价