Abstract:
Discounted reward reinforcement learning is the mainstream of reinforcement learning research and its short-term reward is more important than a long-term reward owing to the discount factor. However, sometimes the long-term reward is optimal and it is reasonable to use the average reward reinforcement learning method. This paper presents average reward reinforcement learning including R-learning and H-learning. The application is proposed.
Key words:
average reward reinforcement learning,
R-learning,
H-learning
摘要: 折扣报酬模型强化学习是目前强化学习研究的主流,但折扣因子的选取使得近期期望报酬的影响大于远期期望报酬的影响,而有时候较大远期期望报酬的策略有可能是最优的,因此比较合理的方法是采用平均报酬模型强化学习。该文介绍了平均报酬模型强化学习的两个主要算法以及主要应用。
关键词:
平均报酬强化学习,
R学习,
H学习
CLC Number:
HUANG Bing-qiang; CAO Guang-yi; LI Jian-hua. Average Reward Reinforcement Learning Theory Algorithms and Its Application[J]. Computer Engineering, 2007, 33(18): 18-19,3.
黄炳强;曹广益;李建华. 平均报酬模型强化学习理论、算法及应用[J]. 计算机工程, 2007, 33(18): 18-19,3.