作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (18): 18-19,3. doi: 10.3969/j.issn.1000-3428.2007.18.006

• 博士论文 • 上一篇    下一篇

平均报酬模型强化学习理论、算法及应用

黄炳强1,曹广益1,李建华2   

  1. (1. 上海交通大学自动化系,上海 200030;2. 华东理工大学计算机系,上海 200237)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-09-20 发布日期:2007-09-20

Average Reward Reinforcement Learning Theory Algorithms and Its Application

HUANG Bing-qiang1, CAO Guang-yi1, LI Jian-hua2   

  1. (1. Department of Automation, Shanghai Jiaotong University, Shanghai 200030; 2. Department of Computer Science, East China University of Science and Technology, Shanghai 200237)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-09-20 Published:2007-09-20

摘要: 折扣报酬模型强化学习是目前强化学习研究的主流,但折扣因子的选取使得近期期望报酬的影响大于远期期望报酬的影响,而有时候较大远期期望报酬的策略有可能是最优的,因此比较合理的方法是采用平均报酬模型强化学习。该文介绍了平均报酬模型强化学习的两个主要算法以及主要应用。

关键词: 平均报酬强化学习, R学习, H学习

Abstract: Discounted reward reinforcement learning is the mainstream of reinforcement learning research and its short-term reward is more important than a long-term reward owing to the discount factor. However, sometimes the long-term reward is optimal and it is reasonable to use the average reward reinforcement learning method. This paper presents average reward reinforcement learning including R-learning and H-learning. The application is proposed.

Key words: average reward reinforcement learning, R-learning, H-learning

中图分类号: