Average Reward Reinforcement Learning Theory     Algorithms and Its Application

doi:10.3969/j.issn.1000-3428.2007.18.006

Computer Engineering ›› 2007, Vol. 33 ›› Issue (18): 18-19,3. doi: 10.3969/j.issn.1000-3428.2007.18.006

• Degree Paper • Previous Articles Next Articles

Average Reward Reinforcement Learning Theory Algorithms and Its Application

HUANG Bing-qiang1, CAO Guang-yi1, LI Jian-hua2

(1. Department of Automation, Shanghai Jiaotong University, Shanghai 200030; 2. Department of Computer Science, East China University of Science and Technology, Shanghai 200237)

Received:1900-01-01 Revised:1900-01-01 Online:2007-09-20 Published:2007-09-20

平均报酬模型强化学习理论、算法及应用

黄炳强1，曹广益1，李建华2

(1. 上海交通大学自动化系，上海 200030；2. 华东理工大学计算机系，上海 200237)

Abstract

Abstract: Discounted reward reinforcement learning is the mainstream of reinforcement learning research and its short-term reward is more important than a long-term reward owing to the discount factor. However, sometimes the long-term reward is optimal and it is reasonable to use the average reward reinforcement learning method. This paper presents average reward reinforcement learning including R-learning and H-learning. The application is proposed.

Key words: average reward reinforcement learning, R-learning, H-learning

摘要： 折扣报酬模型强化学习是目前强化学习研究的主流，但折扣因子的选取使得近期期望报酬的影响大于远期期望报酬的影响，而有时候较大远期期望报酬的策略有可能是最优的，因此比较合理的方法是采用平均报酬模型强化学习。该文介绍了平均报酬模型强化学习的两个主要算法以及主要应用。

关键词: 平均报酬强化学习, R学习, H学习

CLC Number:

TP24

HUANG Bing-qiang; CAO Guang-yi; LI Jian-hua. Average Reward Reinforcement Learning Theory Algorithms and Its Application[J]. Computer Engineering, 2007, 33(18): 18-19,3.

黄炳强;曹广益;李建华. 平均报酬模型强化学习理论、算法及应用[J]. 计算机工程, 2007, 33(18): 18-19,3.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2007.18.006

http://www.ecice06.com/EN/Y2007/V33/I18/18

[1]	HU Zhangfang,SUN Lin,ZHANG Yi,BAO Hezhang. A Robot Path Planning Algorithm Based on Improved QPSO [J]. Computer Engineering, 2019, 45(4): 281-287.
[2]	ZHOU Tao,ZHAO Jin,HU Qiuxia,XI Axing,LIU Dongjie. Global Path Planning and Tracking for Mobile Robot in Cluttered Environment [J]. Computer Engineering, 2018, 44(12): 208-214.
[3]	ZHANG Yong,CHEN Shouyuan,SHAO Zengzhen. Adaptive Formation Algorithm for Multi-robot Tracking Blind Angle Problem [J]. Computer Engineering, 2018, 44(9): 1-8.
[4]	NIU Xiaoning,LIU Hongzhe,YUAN Jiazheng,XUAN Hanyu. RGB-D Indoor Location and Map Building Based on Inliers Tracking Statistics [J]. Computer Engineering, 2018, 44(9): 15-21,27.
[5]	ZENG Bi,HUANG Wen. An Indoor Point Cloud Segmentation Method Fusing with Multi-feature Cluster Ensemble [J]. Computer Engineering, 2018, 44(3): 281-286.
[6]	YANG Dongdong,ZHANG Xiaolin,LI Jiamao. Binocular Visual Odometry Algorithm Based on Local and Global Optimization [J]. Computer Engineering, 2018, 44(1): 1-8.
[7]	JIAN Ming,TANG Mozhen,ZHANG Cuifang,YAN Fei. Indoor Mobile Robot Map Construction Based on Improved Linear Feature Extraction Algorithm [J]. Computer Engineering, 2018, 44(1): 23-29.
[8]	YAN Hao,BAI Ruilin,ZHU Shuo. Trajectory Tracking Control of SCARA Robot Based on Anticipatory-type Indirect Iterative Learning [J]. Computer Engineering, 2017, 43(10): 296-301,309.
[9]	WANG Pei,GUO Jianhui,LI Lunbo,ZHAO Chunxia. Negative Obstacle Detection Algorithm Based on Single Line Laser Radar and Vision Fusion [J]. Computer Engineering, 2017, 43(7): 303-308.
[10]	LIU Zhao,SONG Libin,GENG Meixiao,YU Tao,WANG Zengxi,GUO Kai. Indoor Pedestrian Tracking Method of Dancing Robot Based on Laser Radar [J]. Computer Engineering, 2017, 43(6): 247-252,258.
[11]	XIAO Dawei,ZHAI Junyong. Target Distance Measurement Method with Monocular Vision for Wheeled Mobile Robot [J]. Computer Engineering, 2017, 43(4): 287-291.
[12]	CHANG Tongli,LIU Xuezhe,GU Xincen,GUO Zhipeng. Design of Bionic Quadruped Robot and Stress Analysis for Foot End with Kinematics [J]. Computer Engineering, 2017, 43(4): 292-297.
[13]	CHEN Mingjian,LIN Wei,ZENG Bi. Path Planning for Robot Autonomous Map Building Based on Rolling Window [J]. Computer Engineering, 2017, 43(2): 286-292.
[14]	WANG Yongjia,BAI Ruilin,JI Feng. A Time Optimal Trajectory Planning Method of Delta Robot [J]. Computer Engineering, 2016, 42(12): 295-301.
[15]	ZHAO Yue,LI Jingjiao,WANG Aixia,YANG Dan. Tracking and Registration Algorithm of Augmented Reality on Unknown Scene Based on IEKF-SLAM [J]. Computer Engineering, 2016, 42(1): 272-277.

Please choose a citation manager

Content to export

Average Reward Reinforcement Learning Theory Algorithms and Its Application

平均报酬模型强化学习理论、算法及应用

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Average Reward Reinforcement Learning Theory Algorithms and Its Application

平均报酬模型强化学习理论、算法及应用

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments