[1] PEPPER R.Cisco visual networking index global mobile data traffic forecast update[EB/OL].[2021-09-30].https://www.gsma.com/spectrum/wpcontent/uploads/2013/03/Cisco_VNI-global-mobile-data-traffic-forecastupdate.pdf. [2] ROMERO D, LEUS G.Non-cooperative aerial base station placement via stochastic optimization[C]//Proceedings of the 15th International Conference on Mobile Ad-Hoc and Sensor Networks.Washington D.C., USA:IEEE Press, 2019:131-136. [3] ZENG Y, ZHANG R.Energy-efficient UAV communication with trajectory optimization[J].IEEE Transactions on Wireless Communications, 2017, 16(6):3747-3760. [4] LYU J B, ZENG Y, ZHANG R, et al.Placement optimization of UAV-mounted mobile base stations[J].IEEE Communications Letters, 2017, 21(3):604-607. [5] ALZENAD M, EL-KEYI A, LAGUM F, et al.3-D placement of an unmanned aerial vehicle base station for energy-efficient maximal coverage[J].IEEE Wireless Communications Letters, 2017, 6(4):434-437. [6] KALANTARI E, YANIKOMEROGLU H, YONGACOGLU A.On the number and 3D placement of drone base stations in wireless cellular networks[C]//Proceedings of the 84th IEEE Vehicular Technology Conference.Washington D.C., USA:IEEE Press, 2016:1-6. [7] AL-HOURANI A, KANDEEPAN S, LARDNER S.Optimal LAP altitude for maximum coverage[J].IEEE Wireless Communications Letters, 2014, 3(6):569-572. [8] GUO J L, HUO Y H, SHI X J, et al.3D aerial vehicle base station (UAV-BS) position planning based on deep Q-learning for capacity enhancement of users with different QoS requirements[C]//Proceedings of the 15th International Wireless Communications & Mobile Computing Conference.Washington D.C., USA:IEEE Press, 2019:1508-1512. [9] BAYERLEIN H, DE KERRET P, GESBERT D.Trajectory optimization for autonomous flying base station via reinforcement learning[C]//Proceedings of the 19th IEEE International Workshop on Signal Processing Advances in Wireless Communications.Washington D.C., USA:IEEE Press, 2018:1-5. [10] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Playing atari with deep reinforcement learning[J].Computer Science, 2013, 25:253-262. [11] WANG Q, ZHANG W Q, LIU Y W, et al.Multi-UAV dynamic wireless networking with deep reinforcement learning[J].IEEE Communications Letters, 2019, 23(12):2243-2246. [12] VAN HASSELT H, GUEZ A, SILVER D.Deep reinforcement learning with double Q-learning[J].Artificial Intelligence, 2016, 30(1):14-20. [13] LIU C H, MA X X, GAO X D, et al.Distributed energy-efficient multi-UAV navigation for long-term communication coverage by deep reinforcement learning[J].IEEE Transactions on Mobile Computing, 2020, 19(6):1274-1285. [14] QI H, HU Z Q, HUANG H, et al.Energy efficient 3-D UAV control for persistent communication service and fairness:a deep reinforcement learning approach[J].IEEE Access, 2020, 8:53172-53184. [15] LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[EB/OL].[2021-09-30].https://arxiv.org/abs/1509.02971. [16] YANG S M, SHAN Z, CAO J, et al.Path planning of UAV base station based on deep reinforcement learning[J].Procedia Computer Science, 2022, 202:89-104. [17] FUJIMOTO S, VAN HOOF H, MEGER D.Addressing function approximation error in actor-critic methods[EB/OL].[2021-09-30].https://arxiv.org/abs/1802.09477. [18] ANTHONY T, TIAN Z, BARBER D.Imagination-augmented agents for deep reinforcement learning[C]//Proceedings of Advances in Neural Information Processing Systems.Cambridge, USA:MIT Press, 2017:5360-5370. [19] NAGABANDI A, KAHN G, FEARING R S, et al.Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2018:7559-7566. [20] BUCKMAN J, HAFNER D, TUCKER G, et al.Sample-efficient reinforcement learning with stochastic ensemble value expansion[EB/OL].[2021-09-30].https://arxiv.org/abs/1807.01675. [21] KURUTACH T, CLAVERA I, DUAN Y, et al.Model-ensemble trust-region policy optimization[EB/OL].[2021-09-30].https://arxiv.org/abs/1802.10592. [22] FEINBERG V, WAN A, STOICA I, et al.Model-based value estimation for efficient model-free reinforcement learning[EB/OL].[2021-09-30].https://arxiv.org/abs/1803.00101. [23] CLAVERA I, ROTHFUSS J, SCHULMAN J, et al.Model-based reinforcement learning via meta-policy optimization[EB/OL].[2021-09-30].https://arxiv.org/abs/1809.05214. [24] Recommendation ITU-R.Propagation data and prediction methods required for the design of terrestrial broadband millimetric radio access systems operating in a frequency range of about 20~50 GHz[R].Geneva, Switzerland, 2001. [25] BROCKMAN G, CHEUNG V, PETTERSSON L, et al.OpenAI Gym[EB/OL].[2021-09-30].https://arxiv.org/abs/1606.01540. |