作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2006, Vol. 32 ›› Issue (6): 7-10.

• 博士论文 • 上一篇    下一篇

固定长度经验回放对Q 学习效率的影响

林 明,朱纪洪,孙增圻   

  1. 清华大学计算机系智能技术与系统国家重点实验室,北京 100084
  • 出版日期:2006-03-20 发布日期:2006-03-20

Impact of Experience Replay with Fixed History Length on Q-learning

LIN Ming, ZHU Jihong, SUN Zengqi   

  1. State Key Lab of Intelligence Technology and System, Department of Computer Science and Technology, Tsinghua University, Beijing 100084
  • Online:2006-03-20 Published:2006-03-20

摘要: 提出了一种固定长度经验回放的思想,并将该思想与一步Q 和Peng Q(λ)学习算法相结合,得到了相应的改进算法。该文采用不同的回放长度L 将改进的算法应用在网格环境和汽车爬坡问题中进行了仿真。结果表明,改进的一步Q 学习算法在两个例子中都比原算法具有更好的学习效率。改进的Peng Q(λ)学习在马尔可夫环境中对选择探索动作非常敏感,增大L 几乎不能提高学习的效率,甚至会使学习效率变差;但是在具有非马尔可夫属性的环境中对选择探索动作比较不敏感,增大L 能够显著提高算法的学习速度。实验结果对如何选择适当的L 有着指导作用。

关键词: 经验回放;再励学习;Q 学习

Abstract: In order to improve the learning efficiency of Q-learning, an idea of experience replay with fixed history length is proposed. This idea is integrated into one-step Q and Peng Q(λ)-learning respectively. The improved algorithms are investigated with different history length L in two learning tasks: grid world and mountain car problem. Empirical results show that improved one-step Q-learning has better efficiency than original one-step Q in both tasks. The improved Peng Q(λ) is quite sensitive to exploratory actions in Markovian environment. Increasing L can hardly enhance the performance of the algorithm, and the performance may deteriorate. However, improved Peng Q(λ) is less sensitive to exploratory actions in non-Markovian environments, and increasing L monotonously speeds up policy learning. The experimental findings also provide guidance to appropriate history length L

Key words: Experience replay; Reinforcement learning; Q-learning