作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2018, Vol. 44 ›› Issue (11): 313-320. doi: 10.19678/j.issn.1000-3428.0048935

• 开发研究与工程应用 • 上一篇    

非参数化近似策略迭代并行强化学习算法

季挺,张华   

  1. 南昌大学 江西省机器人与焊接自动化重点实验室,南昌 330031
  • 收稿日期:2017-10-12 出版日期:2018-11-15 发布日期:2018-11-15
  • 作者简介:季挺(1982—),男,博士研究生,主研方向为智能机器人、智能控制;张华,教授。
  • 基金资助:

    国家高技术研究发展计划(SS2013AA041003)

Nonparametric Approximation Strategy Iteration Parallel Reinforcement Learning Algorithm

JI Ting,ZHANG Hua   

  1. Key Lab of Robot and Welding Automation of Jiangxi Province,Nanchang University,Nanchang 330031,China
  • Received:2017-10-12 Online:2018-11-15 Published:2018-11-15

摘要: 针对在线近似策略迭代强化学习算法收敛速度较慢的问题,提出一种非参数化近似策略迭代并行强化学习算法。通过学习单元构建样本采集过程确定并行单元数量,基于径向基函数线性逼近结构设计强化学习单元,然后采用以样本空间完全覆盖为目标的估计方法实现单元自主构建,并基于近似策略迭代进行单元自主学习。其中,各单元通过平均加权法融合得到算法的整体策略。一级倒立摆仿真结果表明,与online LSPI算法和BLSPI算法相比,该算法在保持较高加速比的同时具有较高的效率,其控制参数更少,收敛速度更快。

关键词: 并行强化学习, 非参数化, 策略迭代, K均值聚类, 倒立摆

Abstract: To solve the problem of slow convergence speed of the online approximation strategy iteration reinforcement learning algorithm,a nonparametric approximation strategy iteration parallel reinforcement learning algorithm is proposed.The number of parallel units is determined through the sample collection process of building learning units,the reinforcement learning units are designed based on the linear approximation structure of Radial Basis Function(RBF),and then the independent construction of units is realized by using the estimation method with the target of full coverage of sample space.The independent learning of units is carried out based on approximation strategy iteration.Among them,the whole strategy of the algorithm is obtained by the average weighting method of each unit.Simulation results of first-order inverted pendulum show that,compared with online LSPI algorithm and BLSPI algorithm,this algorithm has higher efficiency while maintaining higher acceleration ratio,fewer control parameters and faster convergence speed.

Key words: parallel reinforcement learning, nonparametric, strategy iteration, K-means clustering, inverted pendulum

中图分类号: