[1]Shehawy H, Pareyson D, Caruso V, et al. Flattening and folding towels with a single-arm robot based on reinforcement learning[J]. Robotics and Autonomous Systems, 2023, 169: 104506.
[2]闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技术综述[J]. 计算机工程, 2021, 47(10): 16-25.
YAN Jiaojie, ZHANG Qieshi, HU Xiping. Review of Path Planning Techniques Based on Reinforcement Learning[J]. Computer Engineering, 2021, 47(10): 16-25.(in Chinese)
[3]Jang I, Noh S, Kim S, et al. An Analysis of the Impact of Dataset Characteristics on Data-Driven Reinforcement Learning for a Robotic Long-Horizon Task[C]//2023 14th International Conference on Information and Communication Technology Convergence (ICTC). Piscataway, NJ, USA: IEEE, 2023: 1681-1683.
[4]Cui Y, Xu Z, Zhong L, et al. A task-adaptive deep reinforcement learning framework for dual-arm robot manipulation[J]. IEEE Transactions on Automation Science and Engineering, 2024, 22: 466-479.
[5]赵寅甫, 冯正勇. 基于深度强化学习的机械臂控制快速训练方法[J]. 计算机工程, 2022, 48(8): 113-120.
ZHAO Yinfu, FENG Zhengyong. Fast Training Method for Manipulator Control Based on Deep Reinforcement Learning[J]. Computer Engineering, 2022, 48(8): 113-120.(in Chinese)
[6]Hu J, Stone P, Martín-Martín R. Causal policy gradient for whole-body mobile manipulation [EB/OL]. [2023-09-28]. https://arxiv.org/abs/2305.04866
[7]Zhang H, Liang H, Cong L, et al. Reinforcement learning based pushing and grasping objects from ungraspable poses[C]//2023 IEEE International Conference on Robotics and Automation (ICRA). Piscataway, NJ, USA: IEEE, 2023: 3860-3866.
[8]Skrynnik A, Staroverov A, Aitygulov E, et al. Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations[J]. Knowledge-Based Systems, 2021, 218: 106844.
[9]Ramirez J, Yu W. Redundant robot control with learning from expert demonstrations[C]//2022 IEEE Symposium Series on Computational Intelligence (SSCI). Piscataway, NJ, USA: IEEE, 2022: 715-720.
[10]Yu Z, Feng Y, Liu L. Reward-free reinforcement learning algorithm using prediction network[M]. Amsterdam, Netherlands: IOS Press, 2020: 663-670.
[11]Florence P, Manuelli L, Tedrake R. Self-supervised correspondence in visuomotor policy learning[J]. IEEE Robotics and Automation Letters, 2019, 5(2): 492-499.
[12]Siegel N Y, Springenberg J T, Berkenkamp F, et al. Keep doing what worked: Behavioral modelling priors for offline reinforcement learning[EB/OL]. [2020-06-17]. https://arxiv.org/abs/2002.08396.
[13]Lu G, Yu T, Deng H, et al. Anybimanual: Transferring unimanual policy for general bimanual manipulation [EB/OL]. [2024-12-09]. https://arxiv.org/abs/2412.06779.
[14]张超,白文松,杜歆,等. 模仿学习综述: 传统与新进展[J]. 中国图象图形学报, 2023, 28(6): 1585-1607.
Zhang Chao, Bai Wensong, Du Xin, et al. Survey of imitation learning: tradition and new advances [J]. Journal of Image and Graphics, 2023, 28(6): 1585-1607.(in Chinese)
[15]Ramirez J, Yu W. Reinforcement learning from expert demonstrations with application to redundant robot control[J]. Engineering Applications of Artificial Intelligence, 2023, 119: 105753.
[16]Gómez P I, Gajardo M E L, Mijatovic N, et al. Enhanced Imitation Learning of Model Predictive Control Through Importance Weighting[J]. IEEE Transactions on Industrial Electronics, 2024, 72(4): 4073-4083.
[17]Lambert N O. Synergy of Prediction and Control in Model-based Reinforcement Learning[D]. Berkeley: University of California, 2022.
[18]Duan A, Batzianoulis I, Camoriano R, et al. A structured prediction approach for robot imitation learning[J]. The International Journal of Robotics Research, 2024, 43(2): 113-133.
[19]Krishnan K G. Using deep reinforcement learning for robot arm control[J]. Journal of Artificial Intelligence and Capsule Networks, 2022, 4(3): 160-166.
[20]Franceschetti A, Tosello E, Castaman N, et al. Robotic arm control and task training through deep reinforcement learning[C]//International Conference on Intelligent Autonomous Systems. Cham: Springer International Publishing, 2021: 532-550.
[21]Lin Q, Ling Q. Robust Reward-Free Actor–Critic for Cooperative Multiagent Reinforcement Learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 35(12): 17318–17329.
[22]Chang J, Uehara M, Sreenivas D, et al. Mitigating covariate shift in imitation learning via offline data with partial coverage[J]. Advances in Neural Information Processing Systems, 2021, 34: 965-979.
[23]Hoque R, Mandlekar A, Garrett C, et al. Intervengen: Interventional data generation for robust and data-efficient robot imitation learning[C]//2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway, NJ, USA: IEEE, 2024: 2840-2846.
[24]Spencer J, Choudhury S, Barnes M, et al. Expert intervention learning: An online framework for robot learning from explicit and implicit human feedback[J]. Autonomous Robots, 2022, 46(1): 99-113.
[25]Reichlin A, Marchetti G L, Yin H, et al. Back to the manifold: Recovering from out-of-distribution states[C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway, NJ, USA: IEEE, 2022: 8660-8666.
[26]Hassan M S, Sanaullah S M. Robotic Arm Manipulation with Inverse Reinforcement Learning & TD-MPC [EB/OL]. [2024-08-07]. https://arxiv.org/abs/2407.12941.
[27]宋莉, 李大字, 徐昕. 逆强化学习算法, 理论与应用研究综述[J]. 自动化学报, 2024, 50(9): 1704-1723.
Song Li, Li Dazi, Xu Xin. A Survey of Inverse Reinforcement Learning Algorithms, Theory and Applications [J]. Acta Automatica Sinica, 2024, 50(9): 1704-1723. (in Chinese)
[28]Wang B, Adeli E, Chiu H, et al. Imitation learning for human pose prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 7124-7133.
[29]Cheng C A, Yan X, Theodorou E, et al. Accelerating imitation learning with predictive models[C]//The 22nd International Conference on Artificial Intelligence and Statistics. Cambridge, MA: PMLR, 2019: 3187-3196.
[30]Ahn K, Mhammedi Z, Mania H, et al. Model predictive control via on-policy imitation learning[C]//Learning for Dynamics and Control Conference. Cambridge, MA: PMLR, 2023: 1493-1505.
[31]Tunyasuvunakool S, Muldal A, Doron Y, et al. dm_control: Software and tasks for continuous control[J]. Software Impacts, 2020, 6: 100022.
[32]Vecchietti L F, Seo M, Har D. Sampling rate decay in hindsight experience replay for robot control[J]. IEEE Transactions on Cybernetics, 2020, 52(3): 1515-1526.
[33]Zakka K, Tabanpour B, Liao Q, et al. Mujoco playground [EB/OL]. [2025-02-12]. https://arxiv.org/abs/2502.08844.
[34]Setiaji B, Pujastuti E, Filza M F, et al. Implementation of reinforcement learning in 2d based games using open AI gym[C]//2022 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS). Piscataway, NJ, USA: IEEE, 2022: 293-297.
[35]Sekar R, Rybkin O, Daniilidis K, et al. Planning to explore via self-supervised world models[C]//International conference on machine learning. Cambridge, MA: PMLR, 2020: 8583-8592.
[36]Hejna J, Sadigh D. Inverse preference learning: Preference-based rl without a reward function[J]. Advances in Neural Information Processing Systems, 2023, 36: 18806-18827.
[37]Agarwal R, Schwarzer M, Castro P S, et al. Deep reinforcement learning at the edge of the statistical precipice[J]. Advances in neural information processing systems, 2021, 34: 29304-29320.
|