固定长度经验回放对Q 学习效率的影响

doi:10.3969/j.issn.1000-3428.2006.06.003

计算机工程 ›› 2006, Vol. 32 ›› Issue (6): 7-10.

固定长度经验回放对Q 学习效率的影响

林明，朱纪洪，孙增圻

清华大学计算机系智能技术与系统国家重点实验室，北京 100084

出版日期:2006-03-20 发布日期:2006-03-20

Impact of Experience Replay with Fixed History Length on Q-learning

LIN Ming, ZHU Jihong, SUN Zengqi

State Key Lab of Intelligence Technology and System, Department of Computer Science and Technology, Tsinghua University, Beijing 100084

Online:2006-03-20 Published:2006-03-20

摘要/Abstract

摘要： 提出了一种固定长度经验回放的思想，并将该思想与一步Q 和Peng Q(λ)学习算法相结合，得到了相应的改进算法。该文采用不同的回放长度L 将改进的算法应用在网格环境和汽车爬坡问题中进行了仿真。结果表明，改进的一步Q 学习算法在两个例子中都比原算法具有更好的学习效率。改进的Peng Q(λ)学习在马尔可夫环境中对选择探索动作非常敏感，增大L 几乎不能提高学习的效率，甚至会使学习效率变差；但是在具有非马尔可夫属性的环境中对选择探索动作比较不敏感，增大L 能够显著提高算法的学习速度。实验结果对如何选择适当的L 有着指导作用。

关键词: 经验回放；再励学习；Q 学习

Abstract: In order to improve the learning efficiency of Q-learning, an idea of experience replay with fixed history length is proposed. This idea is integrated into one-step Q and Peng Q(λ)-learning respectively. The improved algorithms are investigated with different history length L in two learning tasks: grid world and mountain car problem. Empirical results show that improved one-step Q-learning has better efficiency than original one-step Q in both tasks. The improved Peng Q(λ) is quite sensitive to exploratory actions in Markovian environment. Increasing L can hardly enhance the performance of the algorithm, and the performance may deteriorate. However, improved Peng Q(λ) is less sensitive to exploratory actions in non-Markovian environments, and increasing L monotonously speeds up policy learning. The experimental findings also provide guidance to appropriate history length L

Key words: Experience replay; Reinforcement learning; Q-learning

林明，朱纪洪，孙增圻. 固定长度经验回放对Q 学习效率的影响[J]. 计算机工程, 2006, 32(6): 7-10.

LIN Ming, ZHU Jihong, SUN Zengqi. Impact of Experience Replay with Fixed History Length on Q-learning[J]. Computer Engineering, 2006, 32(6): 7-10.

/ 推荐 / 导出引用

链接本文: https://www.ecice06.com/CN/Y2006/V32/I6/7

https://www.ecice06.com/CN/Y2006/V32/I6/7

选择文件类型/文献管理软件名称

选择包含的内容

固定长度经验回放对Q 学习效率的影响

Impact of Experience Replay with Fixed History Length on Q-learning

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

固定长度经验回放对Q 学习效率的影响

Impact of Experience Replay with Fixed History Length on Q-learning

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics

本文评价