[1]LAGOUDAKIS M G,PARR R.Least squares policy iteration[J].Journal of Machine Learning Research,2003,4(6):1107-1149.
[2]BUSONIU L,ERNST D,SCHUTTER B D,et al.Online least-squares policy iteration for reinforcement learning control[C]//Proceedings of 2010 American Control Conference.Washington D.C.,USA:IEEE Press,2010:486-491.
[3]周鑫,刘全,傅启明,等.一种批量最小二乘策略迭代方法[J].计算机科学,2014,41(9):232-238.
[4]KRETCHMAR R M.Parallel reinforcement learning[C]//Proceedings of World Conference on Systemics.Washington D.C.,USA:IEEE Press,2002:60-74.
[5]杨旭东.并行强化学习研究[D].苏州:苏州大学,2015.
[6]TSUGUHISA T,YUUKI N,KOJI Y,et al.Basic research on speed-up of reinforcement learning using parallel processing for combination value function[J].Procedia Computer Science,2011,6:183-188.
[7]ENDA B,JIM D,ENDA H.A parallel framework for bayesian reinforcement learning[J].Connection Science,2014,26(1):7-23.
[8]孟伟,韩学东.并行强化学习算法及其应用研究[J].计算机工程与应用,2009,45(34):25-28,52.
[9]PATRICK M,JIM D,ENDA H.Parallel reinforcement learning for traffic signal control[J].Procedia Computer Science,2015,52:956-961.
[10]GROUNDS M,KUDENKO D.Parallel reinforcement learning with linear function approximation[J].Lecture Notes in Computer Science,2005,4865:60-74.
[11]GROUNDS M J.Scaling-up reinforcement learning using parallelization and symbolic planning[EB/OL].[2017-10-05].https://core.ac.uk/download/pdf/42604945.pdf.
[12]耿晓龙,李长江.基于人工神经网络的并行强化学习自适应路径规划[J].科学技术与工程,2011,11(4):756-759.
[13]KIM M S,HONG G G,LEE J J.Online fuzzy Q-learning with extended rule and interpolation technique[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C.,USA:IEEE Press,2002:757-762.
[14]于剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围[J].中国科学(E辑),2002,32(2):274-280.
[15]KAUFMAN L,ROUSSEEUW P J.Finding groups in data:an introduction to cluster analysis[M].New York,USA:John Wiley and Sons Ltd.,1990. |