作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (8): 113-120. doi: 10.19678/j.issn.1000-3428.0061575

• 人工智能与模式识别 • 上一篇    下一篇

基于深度强化学习的机械臂控制快速训练方法

赵寅甫, 冯正勇   

  1. 西华师范大学 电子信息工程学院, 四川 南充 637009
  • 收稿日期:2021-05-08 修回日期:2021-08-16 发布日期:2021-09-15
  • 作者简介:赵寅甫(1994-),男,硕士研究生,主研方向为强化学习、机械臂控制;冯正勇(通信作者),教授、博士。
  • 基金资助:
    西华师范大学英才基金(17YC046);西华师范大学博士科研启动项目“异构无线网络流媒体传输QOE优化”(13E003)。

Fast Training Method for Manipulator Control Based on Deep Reinforcement Learning

ZHAO Yinfu, FENG Zhengyong   

  1. School of Electronic Information Engineering, China West Normal University, Nanchong, Sichan 637009, China
  • Received:2021-05-08 Revised:2021-08-16 Published:2021-09-15

摘要: 人工智能在机器人控制中得到广泛应用,机器人控制算法也逐渐从模型驱动转变为数据驱动。深度强化学习算法可在复杂环境中感知并决策,能够解决高维度和连续状态空间下的机械臂控制问题。然而,目前深度强化学习中数据驱动的训练过程非常依赖计算机GPU算力,且训练时间成本较大。提出基于深度强化学习的先简化模型(2D模型)再复杂模型(3D模型)的机械臂控制快速训练方法。采用深度确定性策略梯度算法代替机械臂传统控制算法中的逆运动学解算方法,直接通过数据驱动的训练过程控制机械臂末端到达目标位置,从而减小训练时间成本。同时,对于状态向量和奖励函数形式,使用不同的设置方式。将最终训练得到的算法模型在真实机械臂上进行实现和验证,结果表明,其控制效果达到了分拣物品的应用要求,相比于直接在3D模型中的训练,能够缩短近52%的平均训练时长。

关键词: 机械臂, 位置控制, 人工智能, 深度强化学习, 深度确定性策略梯度算法

Abstract: Artificial Intelligence(AI) is widely used in robot control, and the algorithms of robot control are gradually shifting from model-driven to data-driven.Deep reinforcement learning can perceive and make decisions in complex environments and solve manipulator control problems in high-dimensional and continuous state spaces.The current data-driven training process in deep reinforcement learning relies heavily on GPU computing power and requires a significant amount of training time.To address this problem, this study proposes a fast training method for manipulator control based on deep reinforcement learning of simplified model(2D model) followed by complex model(3D model).A Deep Deterministic Policy Gradient(DDPG) algorithm is used to control the end of the manipulator to reach the target position directly through data-driven training instead of the traditional inverse kinematic solving method, thereby reducing the amount of training time.However, at different settings for the state vector and reward function forms, the final trained algorithm model is implemented and verified on a real manipulator.The results show that the control effect meets the application requirements of sorting items and is able to shorten the average training time by nearly 52% compared with that obtained by training directly in the 3D model.

Key words: manipulator, position control, Artificial Intelligence(AI), deep reinforcement learning, Deep Deterministic Policy Gradient (DDPG) algorithm

中图分类号: