[1] Foerster J N, Chen R Y, Al-Shedivat M, et al. Learning with
opponent-learning awareness[J]. arXiv preprint
arXiv:1709.04326, 2017.
[2] He H, Boyd-Graber J, Kwok K, et al. Opponent modeling in
deep reinforcement learning[C]//International conference on
machine learning. PMLR, 2016: 1804-1813.
[3] Kim D K, Liu M, Riemer M D, et al. A policy gradient
algorithm for learning to learn in multiagent reinforcement
learning[C]//International Conference on Machine Learning.
PMLR, 2021: 5541-5550
[4] 王腾, 黄俊松, 王乐庭, 张才坤, 李枭扬. 基于 MADDPG
的多阵面相控阵雷达引导搜索资源优化算法[J]. 计算机工
程, 2024, 50(11): 38-48.
WANG Teng, HUANG Junsong, WANG Leting, ZHANG
Caikun, LI Xiaoyang. Multi-Antenna Phased Array
Radar-Guided Search Resource Optimization Algorithm Based
on MADDPG[J]. Computer Engineering, 2024, 50(11): 38-48.
[5] 施伟,冯旸赫,程光权,黄红蓝,黄金才,刘忠,贺威.基于
深度强化学习的多机协同空战方法研究[J].自动化学
报,2021,47(7):1610-1623.
Shi Wei, Feng Yang-He, Cheng Guang-Quan, Huang
Hong-Lan, Huang Jin-Cai, Liu Zhong, He Wei. Research
on multi-aircraft cooperative air combat method based on deep
reinforcement learning. Acta Automatica Sinica, 2021, 47(7):
1610−1623 doi: 10.16383/j.aas.c201059
[6] Jing Y, Li K, Liu B, et al. Towards Offline Opponent
Modeling with In-context Learning[C]//The Twelfth
International Conference on Learning Representations. 2023.
[7] Zintgraf L, Devlin S, Ciosek K, et al. Deep interactive
bayesian reinforcement learning via meta-learning[J]. arXiv
preprint arXiv:2101.03864, 2021.
[8] Yu X, Jiang J, Zhang W, et al. Model-based opponent
modeling[J]. Advances in Neural Information Processing
Systems, 2022, 35: 28208-28221.
[9] 徐浩添,秦龙,曾俊杰,胡越,张琪.基于深度强化学习的对
手建模方法研究综述[J].系统仿真学报,2023,35(4):671-694
Haotian Xu, Long Qin, Junjie Zeng, Yue Hu, Qi Zhang.
Research Progress of Opponent Modeling Based on Deep
Reinforcement Learning[J]. Journal of System Simulation, 2023,
35(4): 671-694.
[10] LI S, ZHAO H. A Survey on Representation Learning for
User Modeling[C/OL]//Proceedings of the Twenty-Ninth
International Joint Conference on Artificial Intelligence,
Yokohama, Japan. 2020.
http://dx.doi.org/10.24963/ijcai.2020/695.
[11] ROSMAN B, HAWASLY M, RAMAMOORTHY S.
Bayesian Policy Reuse[J/OL]. Machine Learning, 2016: 99-127.
http://dx.doi.org/10.1007/s10994-016-5547-y.
[12] HERNANDEZ-LEAL P, TAYLOR MatthewE, ROSMAN
B, et al. Identifying and Tracking Switching, Non-Stationary
Opponents: A Bayesian Approach[J]. 2016.
[13] Zheng Y, Meng Z, Hao J, et al. A deep bayesian policy
reuse approach against non-stationary agents[J]. Advances in
neural information processing systems, 2018, 31.
[14] Yang T, Meng Z, Hao J, et al. Towards efficient detection
and optimal response against sophisticated opponents[J]. arXiv
preprint arXiv:1809.04240, 2018.
[15] Hong Z W, Su S Y, Shann T Y, et al. A deep policy
inference q-network for multi-agent systems[J]. arXiv preprint
arXiv:1712.07893, 2017.
[16] Al-Shedivat M, Bansal T, Burda Y, et al. Continuous
adaptation via meta-learning in nonstationary and competitive
environments[J]. arXiv preprint arXiv:1710.03641, 2017.
[17] Sohl-Dickstein J, Weiss E, Maheswaranathan N, et al.
Deep unsupervised learning using nonequilibrium
thermodynamics[C]//International conference on machine
learning. PMLR, 2015: 2256-2265.
[18] ROMBACH R, BLATTMANN A, LORENZ D, et al.
High-Resolution Image Synthesis with Latent Diffusion
Models[C/OL]//2022 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), New Orleans, LA,
USA. 2022. http://dx.doi.org/10.1109/cvpr52688.2022.01042.
[19] Kong Z, Ping W, Huang J, et al. Diffwave: A versatile
diffusion model for audio synthesis[J]. arXiv preprint
arXiv:2009.09761, 2020. [20] 陈子民, 关志涛. 基于条件扩散模型的图像分类对抗
样本防御方法[J]. 计算机工程, 2024, 50(12): 296-305.
CHEN Zimin, GUAN Zhitao. Image Classification Adversarial
Example Defense Method Based on Conditional Diffusion
Model[J]. Computer Engineering, 2024, 50(12): 296-305.
[21] Janner M, Du Y, Tenenbaum J B, et al. Planning with
diffusion for flexible behavior synthesis[J]. arXiv preprint
arXiv:2205.09991, 2022.
[22] Wang Z, Hunt J J, Zhou M. Diffusion policies as an
expressive policy class for offline reinforcement learning[J].
arXiv preprint arXiv:2208.06193, 2022.
[23] Lu C, Ball P, Teh Y W, et al. Synthetic experience
replay[J]. Advances in Neural Information Processing Systems,
2024, 36.
[24] Xian Z, Gkanatsios N, Gervet T, et al. Chaineddiffuser:
Unifying trajectory diffusion and keypose prediction for robotic
manipulation[C]//7th Annual Conference on Robot Learning.
2023.
[25] He H, Bai C, Xu K, et al. Diffusion model is an effective
planner and data synthesizer for multi-task reinforcement
learning[J]. Advances in neural information processing systems,
2023, 36: 64896-64917.
[26] Zhang M, Cai Z, Pan L, et al. Motiondiffuse: Text-driven
human motion generation with diffusion model[J]. arXiv
preprint arXiv:2208.15001, 2022.
[27] Ajay A, Du Y, Gupta A, et al. Is conditional generative
modeling all you need for decision-making?[J]. arXiv preprint
arXiv:2211.15657, 2022.
[28] Yang Y, Wang J. An overview of multi-agent
reinforcement learning from game theoretical perspective[J].
arXiv preprint arXiv:2011.00583, 2020.
[29] Zheng Q, Zhang A, Grover A. Online decision
transformer[C]//international conference on machine learning.
PMLR, 2022: 27042-27059.
[30] Hussein A, Gaber M M, Elyan E, et al. Imitation learning:
A survey of learning methods[J]. ACM Computing Surveys
(CSUR), 2017, 50(2): 1-35.
[31] Agrawal P, Nair A V, Abbeel P, et al. Learning to poke by
poking: Experiential learning of intuitive physics[J]. Advances
in neural information processing systems, 2016, 29.
[32]Lanctot M, Lockhart E, Lespiau J B, et al. OpenSpiel: A
framework for reinforcement learning in games[J]. arXiv
preprint arXiv:1908.09453, 2019.
[33] Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic
for mixed cooperative-competitive environments[J]. Advances
in neural information processing systems, 2017, 30.
[34] Papoudakis G, Christianos F, Albrecht S. Agent modelling
under partial observability for deep reinforcement learning[J].
Advances in Neural Information Processing Systems, 2021, 34:
19210-19222.
[35] Zintgraf L, Devlin S, Ciosek K, et al. Deep interactive
bayesian reinforcement learning via meta-learning[J]. arXiv
preprint arXiv:2101.03864, 2021.
[36] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy
optimization algorithms[J]. arXiv preprint arXiv:1707.06347,
2017.
[37] Prajapat M, Azizzadenesheli K, Liniger A, et al.
Competitive policy optimization[C]//Uncertainty in Artificial
Intelligence. PMLR, 2021: 64-74.
|