1 |
BI Q, SUN H D, QIAN C, et al. An improved seeds scheme in K-means clustering algorithm for the UAVs control system application. IET Communications, 2024, 18(7): 437- 449.
doi: 10.1049/cmu2.12746
|
2 |
MORSALI M, FRISK E, ÅSLUND J. Spatio-temporal planning in multi-vehicle scenarios for autonomous vehicle using support vector machines. IEEE Transactions on Intelligent Vehicles, 2021, 6(4): 611- 621.
doi: 10.1109/TIV.2020.3042087
|
3 |
王毅然, 经小川, 贾福凯, 等. 基于多智能体协同强化学习的多目标追踪方法. 计算机工程, 2020, 46(11): 90- 96.
doi: 10.19678/j.issn.1000-3428.0055904
|
|
WANG Y R, JING X C, JIA F K, et al. Multi-target tracking method based on multi-agent collaborative reinforcement learning. Computer Engineering, 2020, 46(11): 90- 96.
doi: 10.19678/j.issn.1000-3428.0055904
|
4 |
CHU T S, WANG J, CODECÀ L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 1086- 1095.
doi: 10.1109/TITS.2019.2901791
|
5 |
BRYSON A E, HO Y C, SIOURIS G M. Applied optimal control: optimization, estimation, and control. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(6): 366- 367.
doi: 10.1109/TSMC.1979.4310229
|
6 |
SEN S, WEISS G. Learning in multiagent systems[M]//Multiagent systems: a modern approach to distributed artificial intelligence. Cambridge: MIT Press, 1999: 259-298.
|
7 |
STONE P, VELOSO M. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots, 2000, 8(3): 345- 383.
doi: 10.1023/A:1008942012299
|
8 |
YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games[EB/OL]. [2024-03-01]. http://arxiv.org/abs/2103.01955.
|
9 |
ELMAN J. Finding structure in time. Cognitive Science, 1990, 14(2): 179- 211.
doi: 10.1207/s15516709cog1402_1
|
10 |
|
11 |
SUTTON R S, MCALLESTER D A, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of Advances in Neural Information Processing Systems. [S. l. ]: AAAI Press, 1999: 1057-1063.
|
12 |
|
13 |
|
14 |
|
15 |
HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1801.01290.
|
16 |
闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技术综述. 计算机工程, 2021, 47(10): 16- 25.
doi: 10.19678/j.issn.1000-3428.0060683
|
|
YAN J J, ZHANG Q S, HU X P. Review of path planning techniques based on reinforcement learning. Computer Engineering, 2021, 47(10): 16- 25.
doi: 10.19678/j.issn.1000-3428.0060683
|
17 |
饶东宁, 罗南岳. 基于多任务强化学习的堆垛机调度与库位推荐. 计算机工程, 2023, 49(2): 279-287, 295.
doi: 10.19678/j.issn.1000-3428.0063943
|
|
RAO D N, LUO N Y. Stacker scheduling and repository location recommendation based on multi-task reinforcement learning. Computer Engineering, 2023, 49(2): 279-287, 295.
doi: 10.19678/j.issn.1000-3428.0063943
|
18 |
|
19 |
TABISH R, MIKAYEL S, SCHROEDER D W C, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[EB/OL]. [2024-03-01]. https://arxiv.org/abs/2003.08839.
|
20 |
|
21 |
LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[EB/OL]. [2024-03-01]. http://arxiv.org/abs/1706.02275.
|
22 |
|
23 |
DE WITT C S, GUPTA T, MAKOVIICHUK D, et al. Is independent learning all you need in the StarCraft multi-agent challenge? [EB/OL]. [2024-03-01]. http://arxiv.org/abs/2011.09533.
|
24 |
SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[EB/OL]. [2024-03-01]. https://arxiv.org/pdf/1506.02438.
|
25 |
|
26 |
李玺, 查宇飞, 张天柱, 等. 深度学习的目标跟踪算法综述. 中国图象图形学报, 2019, 24(12): 2057- 2080.
|
|
LI X, ZHA Y F, ZHANG T Z, et al. Survey of visual object tracking algorithms based on deep learning. Journal of Image and Graphics, 2019, 24(12): 2057- 2080.
|