作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于预测-协同优化的机械臂零动作模仿学习

  • 发布日期:2025-09-19

Zero-Action Imitation Learning for Robotic Arm via Predictive-Collaborative Optimization

  • Published:2025-09-19

摘要: 强化学习在机器人控制中面临奖励函数设计困难的挑战,而模仿学习虽规避了奖励工程的难题,却需依赖高成本的专家动作数据。为此,研究提出一种基于预测-协同优化的机械臂零动作模仿学习框架。该方法融合模型预测控制(MPC)与最大后验概率(MAP)的贝叶斯修正,通过多步动作序列优化实现机械臂精准操控,同时消除对专家动作数据和人工奖励设计的依赖。框架的核心是利用MPC的滚动优化机制,以最小化多步状态误差为目标,动态调整动作序列,增强对噪声和预测不确定性的鲁棒性。在此过程中,MAP方法被引入到单步优化,通过先验分布与似然性修正每个动作,提升动作优化的局部合理性与效率。与传统方法不同,该框架仅依赖专家状态而非专家动作,通过预测模型生成目标状态,避免了专家动作数据收集的困难,同时克服了预测误差累积的问题。实验结果表明,该方法在多种机械臂仿真任务中均优于现有基线方法,其中平均回报提升约45.8%,预测误差降低约50.7%,展现了更高的动作执行精度和对复杂环境的适应能力,并在真实机械臂平台上实现了稳定的控制,验证了跨平台工程化潜力。

Abstract: Reinforcement learning faces the challenge of difficulty in designing reward functions in robot control, while imitation learning, although it avoids the problem of reward engineering, relies on high-cost expert motion data. To this end, the research proposes a zero-motion imitation learning framework for robotic arms based on Predictive-Collaborative Optimization. This method integrates model predictive control (MPC) with Bayesian correction of the maximum a posteriori (MAP), achieving precise control of the robotic arm through multi-step action sequence optimization, while eliminating the reliance on expert action data and manual reward design. The core of the framework is to utilize the rolling optimization mechanism of MPC, aiming to minimize multi-step state errors, dynamically adjust the action sequence, and enhance robustness against noise and prediction uncertainties. During this process, the MAP method is introduced into single-step optimization, where each action is corrected through prior distribution and likelihood, thereby enhancing the local rationality and efficiency of action optimization. Unlike traditional methods, this framework relies only on expert states rather than expert actions. It generates the target state through a prediction model, avoiding the difficulty of collecting expert action data and simultaneously overcoming the problem of accumulated prediction errors. The experimental results show that this method outperforms the existing baseline methods in various robotic arm simulation tasks, with an average return increase of approximately 45.8% and a prediction error reduction of approximately 50.7%. It demonstrates higher action execution accuracy and adaptability to complex environments, and has achieved stable control on a real robotic arm platform, verifying the potential for cross-platform engineering.