| 1 | 黄柯棣, 刘宝全, 黄健, 等. 作战仿真技术综述[C]//全球化制造高级论坛暨21世纪仿真技术研讨会论文集. 北京: 中国系统仿真学会, 2004: 80-89. | 
																													
																							|  | HUANG K D, LIU B Q, HUANG J, et al. A survey of military simulation technologies[C]//Global Manufacturing Advanced Forum and 21st Century Simulation Technology Seminar. Beijing: China System Simulation Society, 2004: 80-89. | 
																													
																							| 2 | 赵慧赟, 张东戈. 战场指挥控制时效性影响因素分析. 军事运筹与系统工程, 2015, 29(2): 12-16, 49.  URL
 | 
																													
																							|  | ZHAO H Y, ZHANG D G. Analysis of influencing factors on timeliness of battlefield command and control. Military Operations Research and Assessment, 2015, 29(2): 12-16, 49.  URL
 | 
																													
																							| 3 | 尹强, 叶雄兵. 作战筹划方法研究. 国防科技, 2016, 37(1): 95- 99.  URL
 | 
																													
																							|  | YIN Q, YE X B. The initially research for the method of operational design. National Defense Science & Technology, 2016, 37(1): 95- 99.  URL
 | 
																													
																							| 4 | 曹占广, 陶帅, 胡晓峰, 等. 国外兵棋推演及系统研究进展. 系统仿真学报, 2021, 33(9): 2059- 2065.  URL
 | 
																													
																							|  | CAO Z G, TAO S, HU X F, et al. Abroad wargaming deduction and system research. Journal of System Simulation, 2021, 33(9): 2059- 2065.  URL
 | 
																													
																							| 5 | 刘海洋, 唐宇波, 胡晓峰, 等. 基于兵棋推演的联合作战方案评估框架研究. 系统仿真学报, 2018, 30(11): 4115-4122, 4131.  URL
 | 
																													
																							|  | LIU H Y, TANG Y B, HU X F, et al. Research on evaluation framework of COA based on wargaming. Journal of System Simulation, 2018, 30(11): 4115-4122, 4131.  URL
 | 
																													
																							| 6 | SURDU J R. The deep green concept[C]//Processings of the 2008 Spring Simulation Multiconference. Berlin, Germany: Springer, 2008: 623-631. | 
																													
																							| 7 | 李承兴, 高桂清, 鞠金鑫, 等. 基于人工智能深度增强学习的装备维修保障兵棋研究. 兵器装备工程学报, 2018, 39(2): 61- 65.  URL
 | 
																													
																							|  | LI C X, GAO G Q, JU J X, et al. Study on equipment maintenance and security based on artificial intelligence depth enhancement. Journal of Ordnance Equipment Engineering, 2018, 39(2): 61- 65.  URL
 | 
																													
																							| 8 | 张晓海, 操新文, 耿松涛, 等. 基于深度学习的军事辅助决策智能化研究. 兵器装备工程学报, 2018, 39(10): 162- 167.  URL
 | 
																													
																							|  | ZHANG X H, CAO X W, GENG S T, et al. Research on intelligence of military auxiliary decision-making system based on deep learning. Journal of Ordnance Equipment Engineering, 2018, 39(10): 162- 167.  URL
 | 
																													
																							| 9 | 杨思明, 单征, 丁煜, 等. 深度强化学习研究综述. 计算机工程, 2021, 47(12): 19- 29.  URL
 | 
																													
																							|  | YANG S M, SHAN Z, DING Y, et al. Survey of research on deep reinforcement learning. Computer Engineering, 2021, 47(12): 19- 29.  URL
 | 
																													
																							| 10 | 徐佳乐, 张海东, 赵东海, 等. 基于卷积神经网络的陆战兵棋战术机动策略学习. 系统仿真学报, 2022, 34(10): 2181- 2193.  URL
 | 
																													
																							|  | XU J L, ZHANG H D, ZHAO D H, et al. Learning tactics and maneuvering strategies of marine chess based on convolutional neural network. Journal of System Simulation, 2022, 34(10): 2181- 2193.  URL
 | 
																													
																							| 11 |  | 
																													
																							| 12 | 刘全, 翟建伟, 章宗长, 等. 深度强化学习综述. 计算机学报, 2018, 41(1): 1- 27.  URL
 | 
																													
																							|  | LIU Q, ZHAI J W, ZHANG Z Z, et al. A survey on deep reinforcement learning. Chinese Journal of Computers, 2018, 41(1): 1- 27.  URL
 | 
																													
																							| 13 | WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, 8(3/4): 229- 256. | 
																													
																							| 14 | RIEDMILLER M. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method[C]//Proceedings of European Conference on Machine Learning. Berlin, Germany: Springer, 2005: 317-328. | 
																													
																							| 15 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529- 533. | 
																													
																							| 16 | SUTTON R S. Learning to predict by the methods of temporal differences. Machine Learning, 1988, 3(1): 9- 44. | 
																													
																							| 17 | CAO J Q, LIU Q, ZHU F, et al. Gradient temporal-difference learning for off-policy evaluation using emphatic weightings. Information Sciences, 2021, 580, 311- 330. | 
																													
																							| 18 | YANG Z Y, MERRICK K, JIN L W, et al. Hierarchical deep reinforcement learning for continuous action control. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(11): 5174- 5184. | 
																													
																							| 19 | 姚桐, 王越, 董岩, 等. 深度强化学习在作战任务规划中的应用. 飞航导弹, 2020,(4): 16- 21.  URL
 | 
																													
																							|  | YAO T, WANG Y, DONG Y, et al. Application of deep reinforcement learning in operational mission planning. Aerospace Technology, 2020,(4): 16- 21.  URL
 | 
																													
																							| 20 | MNIH V, GREGORY K. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, USA: ACM Press, 2016: 1-10. | 
																													
																							| 21 | ZHAO T T, HACHIYA H, NIU G, et al. Analysis and improvement of policy gradient estimation. Neural Networks, 2012, 26, 118- 129. | 
																													
																							| 22 |  | 
																													
																							| 23 |  | 
																													
																							| 24 | SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]//Proceedings of the 32nd International Conference on Machine Learning. New York, USA: ACM Press, 2015: 1889-1897. | 
																													
																							| 25 |  | 
																													
																							| 26 | DAVID S, AJA H, MADDISON CHRIS J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484- 489. | 
																													
																							| 27 | 李昊. 五子棋人机博弈算法优化研究与实现[D]. 大连: 大连海事大学, 2020. | 
																													
																							|  | LI H. Research and implementation of man-machine game algorithm optimization for gobang[D]. Dalian: Dalian Maritime University, 2020. (in Chinese) |