Design of Multi-Agent Angle Tracking Method Based on Deep Reinforcement Learning

doi:10.19678/j.issn.1000-3428.0069710

Abstract

Abstract:

In intelligent situational awareness application scenarios, multi-agent angle tracking problems often occur when moving targets must be monitored and controlled. In contrast to traditional target tracking, the angle tracking task entails not only tracking the spatial coordinates of the target, but also determining the relative angles between targets. Existing control methods often exhibit unstable effects and reduced performance when addressing large-scale problems that are susceptible to environmental changes. To address this problem, the present study proposes a solution scheme based on Multi-Agent Reinforcement Learning(MARL). First, a basic model of the multi-agent angle tracking problem is established, a multi-level simulation decision-making framework is designed, and an adaptive method is proposed for this problem. As a stronger multi-agent reinforcement learning algorithm, AR-MAPPO enhances learning efficiency and model stability by dynamically adjusting the number of data reuse rounds. The experimental results show that the proposed method achieves higher convergence efficiency and better angle tracking performance than traditional methods and other reinforcement learning methods in multi-agent angle tracking tasks.

Key words: intelligent decision system, artificial intelligence, deep reinforcement learning, Multi-Agent Reinforcement Learning(MARL), angle tracking

摘要：

在智能态势感知应用场景中, 多智能体角度跟踪问题常出现在需要对移动目标进行监测和控制的场景。与传统的目标跟踪方法不同, 角度跟踪任务不仅需要追踪目标的空间坐标, 还需确定目标间的相对角度。现有控制方法在处理这类规模较大且易受环境变化影响的问题时往往效果不稳定或性能降低。为此, 提出一种基于多智能体强化学习(MARL)的解决方案, 首先建立多智能体角度跟踪问题的基础模型, 然后设计1个多层次的仿真决策框架并提出针对此问题适应性更强的多智能体强化学习算法AR-MAPPO, 通过动态调整数据复用轮数以提升学习效率和模型稳定性。实验结果表明, 该方法在多智能体角度跟踪任务中相比传统方法和其他强化学习方法具有更高的收敛效率和更优的角度跟踪性能。

关键词: 智能决策系统, 人工智能, 深度强化学习, 多智能体强化学习, 角度跟踪

BI Qian, QIAN Cheng, ZHANG Ke, WANG Cheng. Design of Multi-Agent Angle Tracking Method Based on Deep Reinforcement Learning[J]. Computer Engineering, 2024, 50(11): 10-17.

毕千, 钱程, 张可, 王成. 基于深度强化学习的多智能体角度跟踪方法设计[J]. 计算机工程, 2024, 50(11): 10-17.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069710

https://www.ecice06.com/EN/Y2024/V50/I11/10

Figures/Tables 13

Fig.1 Schematic diagram of multi-agent angle tracking

Fig.2 Framework of multi-agent angle tracking simulation decision

Fig.3 Schematic diagram of multi-agent angle tracking scenario

Fig.4 Average episode rewards of multi-agent training

Fig.5 Multi-agent angle tracking N-S diagram of manually design

Fig.6 The effectiveness of multi-agent testing environment

Fig.7 Comparison of average round rewards for AR-MAPPO, rMAPPO and MADDPG

Fig.8 Average episode rewards under different numbers of multi-agent training

Fig.9 Line chart of environmental non-stationarity test effect

Fig.10 Box plot of environmental non-stationarity test effect

References 26

1	BI Q, SUN H D, QIAN C, et al. An improved seeds scheme in K-means clustering algorithm for the UAVs control system application. IET Communications, 2024, 18(7): 437- 449. doi: 10.1049/cmu2.12746
2	MORSALI M, FRISK E, ÅSLUND J. Spatio-temporal planning in multi-vehicle scenarios for autonomous vehicle using support vector machines. IEEE Transactions on Intelligent Vehicles, 2021, 6(4): 611- 621. doi: 10.1109/TIV.2020.3042087
3	王毅然, 经小川, 贾福凯, 等. 基于多智能体协同强化学习的多目标追踪方法. 计算机工程, 2020, 46(11): 90- 96. doi: 10.19678/j.issn.1000-3428.0055904
	WANG Y R, JING X C, JIA F K, et al. Multi-target tracking method based on multi-agent collaborative reinforcement learning. Computer Engineering, 2020, 46(11): 90- 96. doi: 10.19678/j.issn.1000-3428.0055904
4	CHU T S, WANG J, CODECÀ L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 1086- 1095. doi: 10.1109/TITS.2019.2901791
5	BRYSON A E, HO Y C, SIOURIS G M. Applied optimal control: optimization, estimation, and control. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(6): 366- 367. doi: 10.1109/TSMC.1979.4310229
6	SEN S, WEISS G. Learning in multiagent systems[M]//Multiagent systems: a modern approach to distributed artificial intelligence. Cambridge: MIT Press, 1999: 259-298.
7	STONE P, VELOSO M. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots, 2000, 8(3): 345- 383. doi: 10.1023/A:1008942012299
8	YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL]. [2024-03-01]. http://arxiv.org/abs/2103.01955.
9	ELMAN J. Finding structure in time. Cognitive Science, 1990, 14(2): 179- 211. doi: 10.1207/s15516709cog1402_1
10	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[EB/OL]. [2024-03-01]. http://arxiv.org/abs/1312.5602.
11	SUTTON R S, MCALLESTER D A, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of Advances in Neural Information Processing Systems. [S. l. ]: AAAI Press, 1999: 1057-1063.
12	MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1602.01783.
13	SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1502.05477.
14	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1707.06347.
15	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1801.01290.
16	闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技术综述. 计算机工程, 2021, 47(10): 16- 25. doi: 10.19678/j.issn.1000-3428.0060683
	YAN J J, ZHANG Q S, HU X P. Review of path planning techniques based on reinforcement learning. Computer Engineering, 2021, 47(10): 16- 25. doi: 10.19678/j.issn.1000-3428.0060683
17	饶东宁, 罗南岳. 基于多任务强化学习的堆垛机调度与库位推荐. 计算机工程, 2023, 49(2): 279-287, 295. doi: 10.19678/j.issn.1000-3428.0063943
	RAO D N, LUO N Y. Stacker scheduling and repository location recommendation based on multi-task reinforcement learning. Computer Engineering, 2023, 49(2): 279-287, 295. doi: 10.19678/j.issn.1000-3428.0063943
18	SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-decomposition networks for cooperative multi-agent learning[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1706.05296.
19	TABISH R, MIKAYEL S, SCHROEDER D W C, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. [2024-03-01]. https://arxiv.org/abs/2003.08839.
20	FOERSTER J N, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1705.08926.
21	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. [2024-03-01]. http://arxiv.org/abs/1706.02275.
22	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1509.02971.
23	DE WITT C S, GUPTA T, MAKOVIICHUK D, et al. Is independent learning all you need in the StarCraft multi-agent challenge? [EB/OL]. [2024-03-01]. http://arxiv.org/abs/2011.09533.
24	SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1506.02438.
25	KUBA J G, CHEN R Q, WEN M N, et al. Trust region policy optimisation in multi-agent reinforcement learning[EB/OL]. [2024-03-01]. https://arxiv.org/abs/2109.11251v2.
26	李玺, 查宇飞, 张天柱, 等. 深度学习的目标跟踪算法综述. 中国图象图形学报, 2019, 24(12): 2057- 2080.
	LI X, ZHA Y F, ZHANG T Z, et al. Survey of visual object tracking algorithms based on deep learning. Journal of Image and Graphics, 2019, 24(12): 2057- 2080.

[1]	Qiong SHI, Hui DUAN, Zhibin SHI. Trusted Task Offloading Scheme Based on Deep Reinforcement Learning [J]. Computer Engineering, 2024, 50(8): 142-152.
[2]	ZHAO Yuntao, XIAO Junjie, LI Weigang, XIONG Yating. Research on Information Code Correction Based on Improved PPYOLOE-R [J]. Computer Engineering, 2024, 50(6): 358-366.
[3]	FU Mingjian, GUO Fuqiang. Research on Decision-Making at Intersection Without Traffic Lights Based on Deep Reinforcement Learning [J]. Computer Engineering, 2024, 50(5): 91-99.
[4]	SUN Wenjie, LI Zongmin, SUN Haomiao. Multi-Agent Reinforcement Learning Value Function Factorization Approach Based on Graph Neural Network [J]. Computer Engineering, 2024, 50(5): 62-70.
[5]	LI Jingcan, XIAO Cuilin, QIN Xiaoting, XIE Xia. Text-Relation-Extraction Algorithm Based on Large-Language Model and Semantic Enhancement [J]. Computer Engineering, 2024, 50(4): 87-94.
[6]	Haijun DU, Su YU. Dynamic Obstacle Avoidance for Service Robots Based on Spatio-Temporal Graph Attention Network [J]. Computer Engineering, 2024, 50(2): 105-112.
[7]	JIANG Min, CHEN Fei, CHENG Hang, WANG Meiqing. Edge-Preserving Image Restoration Based on Pixel-by-Pixel Reinforcement Learning [J]. Computer Engineering, 2024, 50(12): 224-232.
[8]	HE Jie, MA Qiang. Research on C-V2X Task Offloading Based on Deep Reinforcement Learning [J]. Computer Engineering, 2024, 50(12): 200-212.
[9]	SONG Yanrui, ZHUANG Lei, XU Zexi, FENG Xu, MO Wenshuai. Reliable Service Function Chain Deployment Algorithm Based on Edge-Cloud Collaboration [J]. Computer Engineering, 2024, 50(12): 184-193.
[10]	NI Sujie, CHEN Bing, SHI You. A Task Offloading Optimization Scheme Combining V2I and V2V [J]. Computer Engineering, 2024, 50(12): 174-183.
[11]	WANG Teng, HUANG Junsong, WANG Leting, ZHANG Caikun, LI Xiaoyang. Multi-Antenna Phased Array Radar-Guided Search Resource Optimization Algorithm Based on MADDPG [J]. Computer Engineering, 2024, 50(11): 38-48.
[12]	Ziyue CAI, Beihai TAN, Rong YU, Xumin HUANG, Siming WANG. Dynamic Blockchain Sharding for 6G Internet of Things Devices Collaboration [J]. Computer Engineering, 2024, 50(1): 50-59.
[13]	Shui HU. Intelligent Wargame Deduction Decision Method Based on Deep Reinforcement Learning [J]. Computer Engineering, 2023, 49(9): 303-312.
[14]	Linghui KONG, Zheheng RAO, Yanyan XU, Shaoming PAN. Intelligent Routing Algorithm for Wireless Networks Based on Deep Reinforcement Learning [J]. Computer Engineering, 2023, 49(9): 199-207, 216.
[15]	Guanying ZHANG, Peng YI, Dan LI, Di ZHU, Ming MAO. Service Function Chain Deployment Method for Large-Scale Network [J]. Computer Engineering, 2023, 49(8): 122-129.

Please choose a citation manager

Content to export