基于强化学习算法的软硬结合象鼻抓取器设计

doi:10.19678/j.issn.1000-3428.0252363

摘要/Abstract

摘要： 本研究旨在设计并实现一款新型的模块化仿生象鼻抓取器，采用基于近端策略优化（Proximal Policy Optimization, PPO）强化学习算法的运动控制策略。该抓取器巧妙地融合了刚性与柔性的设计原则，构建了模块化的结构，从而显著增强了系统的灵活性和可扩展性。在硬件设计层面，应用了刚柔结合的策略：刚性部分保障了结构的强度与稳定性，而柔性部分则致力于适应各种形状和硬度的物体，以实现精准抓取。通过对仿生象鼻抓取器运用PPO算法进行运动控制，本研究成功模拟了象鼻的复杂运动行为，并将训练完成的模型应用于实际抓取任务中。在物理仿真环境下，设置了多种抓取任务，并通过PPO算法的持续迭代优化进行训练，使得抓取器逐渐学会在各种环境条件下自适应调整运动轨迹的能力，以实现精确抓取不同的物体。实验结果证实了该模块化仿生象鼻抓取器在多样化抓取任务中的出色表现，其抓取成功率超过90%，且抓取动作流畅自如。本研究的成果不仅验证了PPO算法在解决复杂的机器人抓取任务上的有效性，同时也为模块化机器人系统的设计及应用开辟了新途径，未来有望在智能制造、医疗辅助和灾害救援等多个领域发挥重要作用。

Abstract: This study aims to design and implement a novel modular bionic trunk gripper, which adopts a motion control strategy based on the proximal policy optimization (PPO) reinforcement learning algorithm. The gripper cleverly combines the design principles of rigidity and flexibility to build a modular structure, which significantly enhances the flexibility and scalability of the system. At the hardware design level, a rigid-flexible combination strategy is applied: the rigid part ensures the strength and stability of the structure, while the flexible part is committed to adapting to objects of various shapes and hardness to achieve precise grasping. By using the PPO algorithm for motion control of the bionic trunk gripper, this study successfully simulated the complex motion behavior of the trunk and applied the trained model to actual grasping tasks. In a physical simulation environment, a variety of grasping tasks were set up, and the gripper was trained through continuous iterative optimization of the PPO algorithm, so that the gripper gradually learned the ability to adaptively adjust the motion trajectory under various environmental conditions to achieve accurate grasping of different objects. The experimental results confirmed the excellent performance of the modular bionic trunk gripper in a variety of grasping tasks, with a grasping success rate of more than 90% and smooth grasping movements. The results of this study not only verified the effectiveness of the PPO algorithm in solving complex robot grasping tasks, but also opened up new avenues for the design and application of modular robot systems. In the future, it is expected to play an important role in many fields such as intelligent manufacturing, medical assistance and disaster relief.

朱家辰, 杨晔, 胡喜友, 王佳明. 基于强化学习算法的软硬结合象鼻抓取器设计[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252363.

Jiachen Zhu, Ye Yang, Xiyou Hu, Jiaming Wang. Design of modular soft-hard trunk gripper based on Reinforcement Learning algorithm[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252363.

参考文献

[1] 廖明蕾. 煤矿钻机机械手结构设计及分析[J]. 现代信息科技, 2022, 6(17): 56-59. LIAO M L. Structural design and analysis of coal-mine drilling-rig manipulator [J]. Technology, 2022, 6(17): 56–59. Modern Information [2] MA R, DOLLAR A. Yale openhand project: optimizing open-source hand designs for ease of fabrication and adoption[J]. IEEE Robotics & Automation Magazine, 2017, 24(1): 32-40. [3] TENG Z C, XU G H, PEI J J, et al. Towards humanlike grasp in robotic hands: mechanical implementation of force synergies[J]. Bioinspiration & Biomimetics, 2024, 19(3): 036017. [4] ZHU Y L, FENG K, HUA C, et al. Model analysis and experimental investigation of soft pneumatic manipulator for fruit grasping[J]. Sensors, 2022, 22(12): 4532. [5] HUANG W K, XIAO J L, XU Z P. A variable structure pneumatic soft robot[J]. Scientific Reports, 2020, 10(1): 18778. [6] 刘思远, 张京涛, 王杰, 等. 一种流体驱动式模块化软体仿生象鼻关节变刚度结构设计及仿真分析[J]. 液压与气动, 2022, 46(5): 152-158. Liu S Y, Zhang J T, Wang J, et al. Design and simulation analysis of a fluid-driven modular soft bionic elephant trunk joint with variable stiffness [J]. Hydraulics & Pneumatics, 2022, 46(5): 152–158. [7] SAMPATH S K, WANG N, WU H, et al. Review on human-like robot manipulation using dexterous hands[J]. Cognitive Computation and Systems, 2023, 5(1): 14-29. [8] Mańkowski T, Tomczyński J, WALAS K, et al. PUT hand—hybrid industrial and biomimetic gripper for elastic object manipulation[J]. Electronics, 2020, 9(7): 1147. [9] REN T, LI Y J, LIU Q Y, et al. Novel bionic soft robotic hand with dexterous deformation and reliable grasping[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72:7502110. [10]QIN G D, WU H P, JI A H. Variable-curvature elephant trunk robot in nuclear industry[J]. Fusion Engineering characteristics of three-dimensional spiral soft actuator driven by water hydraulics for underwater manipulator[J]. Soft Robotics, 2024, 11(3): 410-422. [12]王捍天. 基于PID控制的机器人轨迹跟踪性能研究与比较[J]. 电子元器件与信息技术, 2019, 3(6): 44-47. Wang H T. Research and comparison of robot trajectory tracking performance based on PID control [J]. Electronic Components and Information Technology, 2019, 3(6): 44 47. [13]NAZEER M S, LASCHI C, FALOTICO E. Imitation and reinforcement learning to control soft robots: a perspective[J]. IOP Conference Series: Materials Science and Engineering, 2023, 1292: 012010. [14]YAO J P, CAO Q L, JU Y Q, et al. Adaptive actuation of magnetic soft robots using deep reinforcement learning[J]. Advanced Intelligent Systems, 2023 , 5(2): 2200339. [15]LAN Y X, REN J K, TANG T,et al.Efficient reinforcement learning with least-squares soft Bellman residual for robotic grasping[J]. Robotics and Autonomous Systems, 2023, 164(C): 104385. [16]李丽霞, 陈艳. 基于D-DQN强化学习算法的双足机器人智能控制研究[J].计算机测量与控制, 2024, 32(3): 181-187. Li L X, Chen Y. Intelligent control of biped robots based on the D-DQN Reinforcement Learning algorithm [J]. Computer Measurement and Control, 2024, 32(3): 181–187. [17]何浩东, 符浩, 王强, 等. 基于深度强化学习的多机器人路径跟随与编队[J].计算机应用, 2024, 44(8): 2626-2633. He H D, Fu H, Wang Q, et al. Multi-robot path following and formation based on Deep Reinforcement Learning [J]. Computer Applications, 2024, 44(8): 2626–2633. [18]ADIL K, ZHANG J L, AHMAD S, et al. DQN-based proactive trajectory planning of UAVs in multi-access edge computing[J]. Computers Materials&Continua, 2023, 74(3): 4685-4702. [19]李勇, 张朝兴, 柴燎宁. 基于人工势场 DDPG算法的移动机械臂协同避障轨迹规划[J]. 计算机集成制造系统, 2024, 30(12): 4282-4291. Li Y, Zhang C X, Chai L N. Collaborative obstaclavoidance trajectory planning for mobile manipulators based on Artificial-Potential-Field DDPG algorithm [J]. Computer Integrated Manufacturing Systems, 2024, 30(12): 4282–4291. [20]YIN H Q, WANG C, YAN C, et al. Deep reinforcement learning with multi-critic TD3 for Decentralized multi-robot path planning[J]. IEEE Transactions on Cognitive and Developmental Systems, 2024, 16(4): 1233-1247. [21]ZHANG Q L, HU S Y, DUANG J G, et al. A SAC-Bi RRT two-layer real-time motion planning approach for robot assembly tasks in unstructured environments[J]. Actuators, 2025, 14(2): 59. [22]C ZHANG S B, XIA Q X, CHEN M X, et al. Multi objective optimal trajectory planning for robotic arms using deep reinforcement learning[J]. Sensors, 2023, 23(13): 5974. [23]CHAVDAROV I, NAYDENOV B, Nikolov V, et al.Modular design, communication and control systems of a 3D-printed humanoid robotic hand[J]. Journal of Communications Software and Systems, 2024, 20(2): 146-156. [24]WANG N F, CHEN B C, GE X D, et al. Modular crawling robots using soft pneumatic actuators[J]. Frontiers of Mechanical Engineering, 2021, 16(1): 163-175. [25]MOHAMMED N K.M., RANJITH P R., MIHIRKUMAR S. P, et al. Design and development of modular self-reconfigurable mobile robot[J]. AIP Conference Proceedings, 2024, 2853(1): 1-9. [26]SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv, 2017, 1707: 06347.

选择文件类型/文献管理软件名称

选择包含的内容