基于深度强化学习的机械臂控制快速训练方法

doi:10.19678/j.issn.1000-3428.0061575

摘要/Abstract

摘要： 人工智能在机器人控制中得到广泛应用，机器人控制算法也逐渐从模型驱动转变为数据驱动。深度强化学习算法可在复杂环境中感知并决策，能够解决高维度和连续状态空间下的机械臂控制问题。然而，目前深度强化学习中数据驱动的训练过程非常依赖计算机GPU算力，且训练时间成本较大。提出基于深度强化学习的先简化模型（2D模型）再复杂模型（3D模型）的机械臂控制快速训练方法。采用深度确定性策略梯度算法代替机械臂传统控制算法中的逆运动学解算方法，直接通过数据驱动的训练过程控制机械臂末端到达目标位置，从而减小训练时间成本。同时，对于状态向量和奖励函数形式，使用不同的设置方式。将最终训练得到的算法模型在真实机械臂上进行实现和验证，结果表明，其控制效果达到了分拣物品的应用要求，相比于直接在3D模型中的训练，能够缩短近52%的平均训练时长。

关键词: 机械臂, 位置控制, 人工智能, 深度强化学习, 深度确定性策略梯度算法

Abstract: Artificial Intelligence(AI) is widely used in robot control, and the algorithms of robot control are gradually shifting from model-driven to data-driven.Deep reinforcement learning can perceive and make decisions in complex environments and solve manipulator control problems in high-dimensional and continuous state spaces.The current data-driven training process in deep reinforcement learning relies heavily on GPU computing power and requires a significant amount of training time.To address this problem, this study proposes a fast training method for manipulator control based on deep reinforcement learning of simplified model(2D model) followed by complex model(3D model).A Deep Deterministic Policy Gradient(DDPG) algorithm is used to control the end of the manipulator to reach the target position directly through data-driven training instead of the traditional inverse kinematic solving method, thereby reducing the amount of training time.However, at different settings for the state vector and reward function forms, the final trained algorithm model is implemented and verified on a real manipulator.The results show that the control effect meets the application requirements of sorting items and is able to shorten the average training time by nearly 52% compared with that obtained by training directly in the 3D model.

Key words: manipulator, position control, Artificial Intelligence(AI), deep reinforcement learning, Deep Deterministic Policy Gradient (DDPG) algorithm

中图分类号:

TP18

赵寅甫, 冯正勇. 基于深度强化学习的机械臂控制快速训练方法[J]. 计算机工程, 2022, 48(8): 113-120.

ZHAO Yinfu, FENG Zhengyong. Fast Training Method for Manipulator Control Based on Deep Reinforcement Learning[J]. Computer Engineering, 2022, 48(8): 113-120.

https://www.ecice06.com/CN/Y2022/V48/I8/113

图/表 12

20220825091640

20220825091643

20220825091647

20220825091653

20220825091658

20220825091703

20220825091707

20220825091711

20220825091715

20220825091719

20220825091723

20220825091727

参考文献

[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Playing Atari with deep reinforcement learning[EB/OL].[2021-06-10].https://arxiv.org/abs/1312.5602.
[2] TAMPUU A, MATIISEN T, KODELJA D, et al.Multiagent cooperation and competition with deep reinforcement learning[J].PLoS One, 2017, 12(4):1-10.
[3] 陈奇石.强化学习在仿人机器人行走稳定控制上的研究及实现[D].广州:华南理工大学, 2016. CHEN Q S.Study and implement of reinforcement learning in biped robot balance control[D].Guangzhou:South China University of Technology, 2016.(in Chinese)
[4] ZHANG T H, KAHN G, LEVINE S, et al.Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2016:528-535.
[5] DUAN Y, CHEN X, HOUTHOOFT R, et al.Benchmarking deep reinforcement learning for continuous control[C]//Proceedings of the 33rd International Conference on Machine Learning.New York, USA:ACM Press, 2016:1329-1338.
[6] CAICEDO J C, LAZEBNIK S.Active object localization with deep reinforcement learning[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:2488-2496.
[7] HANSEN S.Using deep Q-learning to control optimization hyperparameters[EB/OL].[2021-06-10].https://arxiv.org/abs/1602.04062.
[8] RICHARD S.SUTTON, BARTO A G.Reinforcement learning:an introduction[M].Cambridge, USA:MIT Press, 1998.
[9] DEGRIS T, WHITE M, SUTTON R S.Off-policy actor-critic[EB/OL].[2021-06-10].https://arxiv.org/pdf/1205. 4839.pdf.
[10] ZHANG A, CASARI A.Feature engineering for machine learning[M].[S.l.]:O'Reilly Media, 2018.
[11] SCOTT S, MATWIN S.Feature engineering for text classification[C]//Proceedings of the 16th International Conference on Machine Learning.Berlin, Germany:1999:379-388.
[12] DEWEY D.Reinforcement learning and the reward engineering principle[C]//Proceedings of 2014 AAAI Spring Symposium.Palo Alto, USA:AAAI Press, 2014:1-10.
[13] 王子强, 武继刚.基于RDC-Q学习算法的移动机器人路径规划[J].计算机工程, 2014, 40(6):211-214. WANG Z Q, WU J G.Mobile robot path planning based on RDC-Q learning algorithm[J].Computer Engineering, 2014, 40(6):211-214.(in Chinese)
[14] SINGH S, BARTO A G, CHENTANEZ N.Intrinsically motivated reinforcement learning[EB/OL].[2021-06-10].https://www.researchgate.net/profile/Satinder-Singh-3/publication/221619598_Intrinsically_Motivated_Reinforcement_Learning/links/55ad05af08aee079921caa19/Intrinsically-Motivated-Reinforcement-Learning.pdf.
[15] SORG J, SINGH S, LEWIS R, et al.Internal rewards mitigate agent boundedness[C]//Proceedings of the 27th International Conference on Machine Learning.New York, USA:ACM Press, 2010:1007-1014.
[16] SORG J, SINGH S, LEWIS R.Reward design via online gradient ascent[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2010:2190-2198.
[17] 卜令正.基于深度强化学习的机械臂控制研究[D].徐州:中国矿业大学, 2019. BU L Z.Study of robot arm control based on deep reinforcement learning[D].Xuzhou:China University of Mining and Technology, 2019.(in Chinese)
[18] NAGPAL R, KRISHNAN A U, YU H S.Reward engineering for object pick and place training[EB/OL].[2020-06-10].https://arxiv.org/abs/2001.03792.
[19] 魏娟, 杨恢先, 谢海霞.基于免疫RBF神经网络的逆运动学求解[J].计算机工程, 2010, 36(22):192-194. WEI J, YANG H X, XIE H X.Solution of inverse kinematics based on immune RBF neural network[J].Computer Engineering, 2010, 36(22):192-194.(in Chinese)
[20] 郑钧天.基于深度强化学习的机械臂轨迹规划仿真[D].成都:电子科技大学, 2020. ZHENG J T.Simulation for manipulator trajectory planning based on deep reinforcement learning[D].Chengdu:University of Electronic Science and Technology of China, 2020.(in Chinese)
[21] 李鹤宇, 赵志龙, 顾蕾, 等.基于深度强化学习的机械臂控制方法[J].系统仿真学报, 2019, 31(11):2452-2457. LI H Y, ZHAO Z L, GU L, et al.Robot arm control method based on deep reinforcement learning[J].Journal of System Simulation, 2019, 31(11):2452-2457.(in Chinese)

选择文件类型/文献管理软件名称

选择包含的内容