[1] GUPTA L,JAIN R,VASZKUN G.Survey of important issues in UAV communication networks[J].IEEE Communications Surveys & Tutorials,2016,18(2):1123-1152. [2] LI Haitao,LUO Jiawei,LIU Changjun.Selfish bandit-based cognitive anti-jamming strategy for aeronautic swarm network in presence of multiple jammer[J].IEEE Access,2019,7:30234-30243. [3] LIN Yu,WANG Tianyu.UAV-assisted emergency communica-tions:an extended multi-armed bandit perspective[J].IEEE Communications Letters,2019,23(5):938-941. [4] SLIMENI F,CHTOUROU Z,SCHEERS B,et al.Cooperative Q-learning based channel selection for cognitive radio networks[J].Wireless Networks,2019,25(7):4161-4171. [5] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [6] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489. [7] WANG S X,LIU H P,GOMES P H,et al.Deep reinforcement learning for dynamic multichannel access in wireless networks[J].IEEE Transactions on Cognitive Communications and Networking,2018,4(2):257-265. [8] BHOWMIK M,MALATHI P.Spectrum sensing in cognitive radio using actor-critic neural network with krill herd-whale optimization algorithm[J].Wireless Personal Communications,2019,105(1):335-354. [9] WEI Y F,YU F R,SONG M,et al.User scheduling and resource allocation in HetNets with hybrid energy supply:an actor-critic reinforcement learning approach[J].IEEE Transactions on Wireless Communications,2018,17(1):680-692. [10] XIAO Liang,CHEN Tianhua,LIU Jinliang,et al.Anti-jamming transmission stackelberg game with observation errors[J].IEEE Communications Letters,2015,19(6):949-952. [11] AHMED I K,FAPOJUWO A O.Stackelberg equilibria of an anti-jamming game in cooperative cognitive radio networks[J].IEEE Transactions on Cognitive Communications and Networking,2018,4(1):121-134. [12] XU Yifan,REN Guochun,CHEN Jin,et al.A one-leader multi-follower Bayesian-stackelberg game for anti-jamming transmission in UAV communication networks[J].IEEE Access,2018,6:21697-21709. [13] JIA Luliang,XU Yuhua,SUN Youming,et al.A multi-domain anti-jamming defense scheme in heterogeneous wireless networks[J].IEEE Access,2018,6:40177-40188. [14] PARISI S,TANGKARATT V,PETERS J,et al.TD-regularized actor-critic methods[J].Machine Learning,2019,108(8/9):1467-1501. [15] TINNIRELLO I,BIANCHI G,XIAO Y.Refinements on IEEE 802.11 distributed coordination function modeling approaches[J].IEEE Transactions on Vehicular Tech-nology,2010,59(3):1055-1067. [16] NAPARSTEK O,COHEN K.Deep multi-user reinforcement learning for distributed dynamic spectrum access[J].IEEE Transactions on Wireless Communications,2019,18(1):310-323. [17] FANG Xiaojie,ZHANG Ning,ZHANG Shan,et al.On physical layer security:weighted fractional Fourier transform based user cooperation[J].IEEE Transactions on Wireless Communications,2017,16(8):5498-5510. [18] HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:off-policy maximum entropy deep reinforcement learning with a stochastic actor[EB/OL].[2019-11-01].https://arxiv.org/abs/1801.01290. [19] O'DONOGHUE B,MUNOS R,KAVUKCUOGLU K,et al.PQG:combining policy gradient and Q-learning[EB/OL].[2019-11-01].https://deepmind.com/research/publicatio ns/pgq-combining-policy-gradient-and-q-learning. [20] MAO H Z,NETRAVALI R,ALIZADEH M.Neural adaptive video streaming with pensieve[C]//Proceedings of the Conference of the ACM Special Interest Group on Data Communication.New York,USA:ACM Press,2017:197-210. |