[1] |
HU Bo,WANG Qiyao,FENG Hui,et al.Adaptive sensor scheduling algorithm for target tracking in wireless sensor networks[J].Journal of Electronics and Information Technology,2018,40(9):2033-2041.(in Chinese)胡波,王祺尧,冯辉,等.一种无线传感器网络中目标跟踪的自适应节点调度算法[J].电子与信息学报,2018,40(9):2033-2041.
|
[2] |
TESAURO G.TD-gammom,a self-teaching backgammon program,achieves master-lever play[J].Neural Computation,1994,6(2):215-219.
|
[3] |
LIU Feng,WANG Chongjun,LUO Bin.A probability-based value iteration on optimal policy algorithm for POMDP[J].Acta Electronica Sinica,2016,44(5):1078-1084.(in Chinese)刘峰,王崇骏,骆斌.一种基于最优策略概率分布的POMDP值迭代算法[J].电子学报,2016,44(5):1078-1084.
|
[4] |
SILVER D,VENESS J.Monte-Carlo planning in large POMDPs[M].Cambridge,USA:MIT Press,2010.
|
[5] |
LITTMAN M L,CASSANDRA A R,KAELBLING L P.Learning policies for partially observable environments:scaling up[C]//Proceedings of International Conference on Machine Learning.New York,USA:ACM Press,1995:362-370.
|
[6] |
HAN Bing.The design and implementation of point-based POMDP policy iteration algorithm[D].Nanjing:Nanjing University,2014.(in Chinese)韩冰.基于点的POMDP策略迭代算法设计与实现[D].南京:南京大学,2014.
|
[7] |
LIU Yunlong,LI Renhou,LIU Jianshu.Q-learning algorithm based on predictive state representations[J].Journal of Xi'an Jiaotong University,2008,42(12):1472-1475.(in Chinese)刘云龙,李人厚,刘建书.基于预测状态表示的Q学习算法[J].西安交通大学学报,2008,42(12):1472-1475.
|
[8] |
LIU Quan,ZHAI Jianwei,ZHANG Zongzhang.A survey on deep reinforcement learning[J].Chinese Journal of Computers,2018,41(1):3-29.(in Chinese)刘全,翟建伟,章宗长.深度强化学习综述[J].计算机学报,2018,41(1):3-29.
|
[9] |
KARKUS P,HSU D,LEE W S.QMDP-Net:deep learning for planning under partial observability[EB/OL].[2019-11-04].https://arxiv.org/abs/1703.06692.
|
[10] |
YU Kai,JIA Lei,CHEN Yuqiang,et al.Deep learning:yesterday,today,and tomorrow[J].Journal of Computer Research and Development,2013,50(9):1799-1804.
|
[11] |
HAARNOJA T,AJAY A,LEVINE S,et al.Backprop KF:learning discriminative deterministic state estimators[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2016:4376-4384.
|
[12] |
KIM W,LEE H,KIM H J.Predictive modeling of time-varying environmental information for path planning[C]//Proceedings of IEEE International Conference on Systems.Washington D.C.,USA:IEEE Press,2013:3639-3644.
|
[13] |
MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
|
[14] |
TAMAR A,WU Y,THOMAS G,et al.Value iteration networks[C]//Proceedings of International Joint Conference on Artificial Intelligence.Washington D.C.,USA:IEEE Press,2016:26-31.
|
[15] |
SHANI G,PINEAU J,KAPLOW R.A survey of point-based POMDP solvers[J].Autonomous Agents and Multi-Agent Systems,2013,27(1):1-51.
|
[16] |
SONDIK E J.The optimal control of partially observable Markov processes over the infinite horizon:discounted costs[J].Operations Research,1978,26(2):282-304.
|
[17] |
MURPHY K P.A survey of POMDP solution techniques[EB/OL].[2019-11-04].https://www.researchgate.net/publication/2275247_A_survey_of_POMDP_solution_techniques.
|
[18] |
KOUTNÍK J,GREFF K,GOMEZ F,et al.A clockwork RNN[EB/OL].[2019-11-04].https://arxiv.org/abs/1402.3511.
|
[19] |
PASCANU R,MIKOLOV T,BENGIO Y.On the difficulty of training recurrent neural networks[C]//Proceedings of the 30th International Conference on Machine Learning.Washington D.C.,USA:IEEE Press,2013:1310-1318.
|
[20] |
CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing.Pittsburgh,USA:Association for Computational Linguistics,2014:1724-1734.
|
[21] |
WERBOS P J.Backpropagation through time:what it does and how to do it[J].Proceedings of the IEEE,1990,78(10):1550-1560.
|