自适应奖励函数的PPO曲面覆盖方法

doi:10.19678/j.issn.1000-3428.0068626

摘要/Abstract

摘要：

针对机器人清洁作业过程中现有曲面覆盖方法难以适应曲面变化且覆盖效率低的问题, 提出一种自适应奖励函数的近端策略优化(PPO)曲面覆盖方法(SC-SRPPO)。首先, 将目标曲面离散化, 以球查询方式获得协方差矩阵, 求解点云的法向量, 建立3D曲面模型; 其次, 以曲面局部点云的覆盖状态特征和曲率变化特征作为曲面模型观测值以构建状态模型, 有利于机器人移动轨迹拟合曲面, 提高机器人对曲面变化的适应能力; 接着, 基于曲面的全局覆盖率和与时间相关的指数模型构建一种自适应奖励函数, 引导机器人向未覆盖区域移动, 提高覆盖效率; 最后, 将曲面局部状态模型、奖励函数、PPO强化学习算法相融合, 训练机器人完成曲面覆盖路径规划任务。在球形、马鞍形、立体心形等3种曲面模型上, 以点云覆盖率与覆盖完成时间作为主要评价指标进行实验, 结果表明, SC-SRPPO的平均覆盖率为90.72%, 与NSGA Ⅱ、PPO、SAC这3种方法对比, 覆盖率分别提升4.98%、14.56%、27.11%, 覆盖完成时间分别缩短15.20%、67.18%、62.64%。SC-SRPPO能够在适应曲面变化的基础上使机器人更加高效地完成曲面覆盖任务。

关键词: 清洁机器人, 曲面, 覆盖路径规划, 强化学习, 近端策略优化

Abstract:

Existing surface coverage methods are difficult to adapt to surface changes, and their coverage efficiency in robot cleaning operations is low. This paper proposes a surface coverage method based on Proximal Policy Optimization (PPO), namely SC-SRPPO, with an adaptive reward function. First, the target surface is discretized and the covariance matrix is obtained via spherical query to solve the normal vector of the point cloud, which is then used to establish the 3D surface model. Second, a state model is constructed using the coverage state and curvature change features of the surface local point cloud as the observation value of the surface model, which guides the robot to fit the surface during movement and improves the adaptability of the robot to the surface. Subsequently, based on the global coverage of the surface and the time-related exponential model, an adaptive reward function is constructed to guide the robot to move to the uncovered area as soon as possible and improve coverage efficiency. Finally, the local state model and reward function of the surface are combined with the PPO algorithm to train the robot to complete surface coverage path planning. The average coverage rate on the sphere of SC-SRPPO was 90.72% for the hyperboloid and heart models. Comparing the NSGA Ⅱ, PPO, and SAC, the coverage rate increased by 4.98%, 14.56%, and 27.11%, respectively, while the coverage completion time was reduced by 15.20%, 67.18%, and 62.64%, respectively. The results show that SC-SRPPO can make the robot complete the surface-covering task more efficiently than NSGA Ⅱ and SAC by adapting to surface changes.

Key words: cleaning robot, surface, coverage path planning, Reinforcement Learning (RL), Proximal Policy Optimization (PPO)

李淑怡, 阳波, 陈灵, 沈玲, 唐文胜. 自适应奖励函数的PPO曲面覆盖方法[J]. 计算机工程, 2025, 51(3): 86-94.

LI Shuyi, YANG Bo, CHEN Ling, SHEN Ling, TANG Wensheng. Surface Coverage Method Based on PPO with Adaptive Reward Function[J]. Computer Engineering, 2025, 51(3): 86-94.

https://www.ecice06.com/CN/Y2025/V51/I3/86

图/表 15

图1 曲面模型示例

Fig.1 Example of surface models

图2 点云法向量示意图

Fig.2 Schematic diagram of point cloud normal vector

图3 覆盖率与R(C_k)关系曲线

Fig.3 Relationship curve between coverage rate and R(C_k)

图4 SC-SRPPO算法框架

Fig.4 SC-SRPPO algorithm framework

图5 曲面点云模型覆盖结果的局部展示

Fig.5 Local display of the coverage results of the surface point cloud models

图6 不同曲面模型下算法的训练结果

Fig.6 The training results of the algorithm under different surface models

图7 不同缓存池大小的覆盖效果

Fig.7 Coverage effect of different cache pool sizes

图8 马鞍曲面局部覆盖轨迹

Fig.8 Local coverage trajectory of saddle surface

图9 基于立体心形模型的奖励函数曲线

Fig.9 Reward function curves based on stereo cardioid model

参考文献 25

1	朱永国, 王鑫, 刘林辉, 等. 基于多因素模型和多尺度遗传算法的复杂曲面喷涂轨迹综合优化. 计算机集成制造系统, 2023, 29 (1): 264- 273. doi: 10.13196/j.cims.2023.01.023
	ZHU Y G , WANG X , LIU L H , et al. Comprehensive option of spraying trajectory for complex surface based on multi-factor model and multi-scale genetic algorithm. Computer Integrated Manufacturing Systems, 2023, 29 (1): 264- 273. doi: 10.13196/j.cims.2023.01.023
2	KIM J , MISHRA A K , LIMOSANI R , et al. Control strategies for cleaning robots in domestic applications: a comprehensive review. International Journal of Advanced Robotic Systems, 2019, 16 (4): 1729881419857432. URL
3	夏桂书, 李锦, 魏永超, 等. 飞机蒙皮激光除漆的路径规划算法. 中国测试, 2023, 49 (12): 136-141, 148. doi: 10.11857/j.issn.1674-5124.2023020120
	XIA G S , LI J , WEI Y C , et al. Path planning algorithm for aircraft skin laser paint stripping. China Measurement & Testing Technology, 2023, 49 (12): 136-141, 148. doi: 10.11857/j.issn.1674-5124.2023020120
4	王亭沂, 马波, 齐静静, 等. 车载式自动除锈喷涂装置的设计及控制研究. 机械设计, 2023, 40 (9): 122- 128. doi: 10.13841/j.cnki.jxsj.2023.09.025
	WANG T Y , MA B , QI J J , et al. Design and control of vehicle-mounted automatic derusting and spraying device. Journal of Mechanical Design, 2023, 40 (9): 122- 128. doi: 10.13841/j.cnki.jxsj.2023.09.025
5	蔡改贫, 陈永康, 周小云, 等. 基于改进蚁群算法的打磨机器人路径规划. 机床与液压, 2022, 50 (9): 48- 54. doi: 10.3969/j.issn.1001-3881.2022.09.008
	CAI G P , CHEN Y K , ZHOU X Y , et al. Path planning of polishing robots based on improved ant colony algorithm. Machine Tool & Hydraulics, 2022, 50 (9): 48- 54. doi: 10.3969/j.issn.1001-3881.2022.09.008
6	李彦征, 刘银华, 赵文政, 等. 不确定检测环境下强化学习覆盖路径规划研究. 机械科学与技术, 2024, 43 (1): 9- 15. doi: 10.13433/j.cnki.1003-8728.20220203
	LI Y Z , LIU Y H , ZHAO W Z , et al. A coverage path planning method with reinforcement learning considering manufacturing process uncertainty. Mechanical Science and Technology for Aerospace Engineering, 2024, 43 (1): 9- 15. doi: 10.13433/j.cnki.1003-8728.20220203
7	马淑梅, 谢涛, 李爱平, 等. 直纹曲面喷漆机器人喷枪轨迹多目标优化. 同济大学学报(自然科学版), 2018, 46 (3): 359- 367. doi: 10.11908/j.issn.0253-374x.2018.03.012
	MA S M , XIE T , LI A P , et al. Multi-objective trajectory optimization of spray painting robot for ruled surfaces. Journal of Tongji University (Natural Science), 2018, 46 (3): 359- 367. doi: 10.11908/j.issn.0253-374x.2018.03.012
8	张见双, 高燕, 李雪锋, 等. 基于曲面参数化的喷涂轨迹规划. 机床与液压, 2021, 49 (23): 83- 86. doi: 10.3969/j.issn.1001-3881.2021.23.016
	ZHANG J S , GAO Y , LI X F , et al. Spraying trajectory planning based on surface parameterization. Machine Tool & Hydraulics, 2021, 49 (23): 83- 86. doi: 10.3969/j.issn.1001-3881.2021.23.016
9	郭万金, 赵伍端, 于苏扬, 等. 无先验模型曲面的机器人打磨主动自适应在线轨迹预测方法. 浙江大学学报(工学版), 2023, 57 (8): 1655- 1666. doi: 10.3785/j.issn.1008-973X.2023.08.018
	GUO W J , ZHAO W D , YU S Y , et al. Active adaptive online trajectory prediction for robotic grinding on surface without prior model. Journal of Zhejiang University (Engineering Science), 2023, 57 (8): 1655- 1666. doi: 10.3785/j.issn.1008-973X.2023.08.018
10	卜新苹, 苏虎, 邹伟, 等. 基于复杂环境非均匀建模的蚁群路径规划. 机器人, 2016, 38 (3): 276- 284. doi: 10.13973/j.cnki.robot.2016.0276
	BU X P , SU H , ZOU W , et al. Ant colony path planning based on non-uniform modeling of complex environment. Robot, 2016, 38 (3): 276- 284. doi: 10.13973/j.cnki.robot.2016.0276
11	杨北辰, 余粟. 改进蚁群算法在路径规划中的应用. 计算机应用研究, 2022, 39 (11): 3292-3297, 3314. doi: 10.19734/j.issn.1001-3695.2022.04.0189
	YANG B C , YU S . Application of improved ant colony algorithm in path planning. Application Research of Computers, 2022, 39 (11): 3292-3297, 3314. doi: 10.19734/j.issn.1001-3695.2022.04.0189
12	张川. 复杂曲面机器人喷漆轨迹自动规划与优化方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2018.
	ZHANG C. Research on automatic planning and optimization method of painting trajectory of robot with complex surface[D]. Harbin: Harbin Institute of Technology, 2018. (in Chinese)
13	程龙, 王欣, 吴迪, 等. 改进人工势场法的洗浴机器人擦洗路径规划. 计算机应用研究, 2023, 40 (9): 2760- 2764. doi: 10.19734/j.issn.1001-3695.2023.02.0053
	CHENG L , WANG X , WU D , et al. Scrubbing path planning of bathing robot based on improved artificial potential field method. Application Research of Computers, 2023, 40 (9): 2760- 2764. doi: 10.19734/j.issn.1001-3695.2023.02.0053
14	李宏超, 邱东. 采用传感器测量联合最小二乘法的机器人路径跟踪控制方法的研究. 机床与液压, 2022, 50 (15): 47- 52. doi: 10.3969/j.issn.1001-3881.2022.15.009
	LI H C , QIU D . Research on robot path tracking control method based on sensor measurement combined with least square method. Machine Tool & Hydraulics, 2022, 50 (15): 47- 52. doi: 10.3969/j.issn.1001-3881.2022.15.009
15	陆子昂, 孙玲, 张远. 基于离散点的曲面喷涂路径规划研究. 机械工程与技术, 2022, 11 (6): 621- 634. doi: 10.12677/MET.2022.116072
	LU Z A , SUN L , ZHANG Y . Research on path planning of curved surface spraying based on discrete points. Mechanical Engineering and Technology, 2022, 11 (6): 621- 634. doi: 10.12677/MET.2022.116072
16	杨笑笑, 柯琳, 陈智斌. 深度强化学习求解车辆路径问题的研究综述. 计算机工程与应用, 2023, 59 (5): 1- 13. doi: 10.3778/j.issn.1002-8331.2210-0153
	YANG X X , KE L , CHEN Z B . Review of deep reinforcement learning model research on vehicle routing problems. Computer Engineering and Applications, 2023, 59 (5): 1- 13. doi: 10.3778/j.issn.1002-8331.2210-0153
17	闫皎洁, 张锲石, 胡希平. 基于强化学习的路径规划技术综述. 计算机工程, 2021, 47 (10): 16- 25. doi: 10.19678/j.issn.1000-3428.0060683
	YAN J J , ZHANG Q S , HU X P . Review of path planning techniques based on reinforcement learning. Computer Engineering, 2021, 47 (10): 16- 25. doi: 10.19678/j.issn.1000-3428.0060683
18	耿远卓, 袁利, 黄煌, 等. 基于终端诱导强化学习的航天器轨道追逃博弈. 自动化学报, 2023, 49 (5): 974- 984. doi: 10.16383/j.aas.c220204
	GENG Y Z , YUAN L , HUANG H , et al. Terminal-guidance based reinforcement-learning for orbital pursuit-evasion game of the spacecraft. Acta Automatica Sinica, 2023, 49 (5): 974- 984. doi: 10.16383/j.aas.c220204
19	ARULKUMARAN K , DEISENROTH M P , BRUNDAGE M , et al. Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine, 2017, 34 (6): 26- 38. doi: 10.1109/MSP.2017.2743240
20	王凤英, 陈莹, 袁帅, 等. 自注意力机制结合DDPG的机器人路径规划研究. 计算机工程与应用, 2024, 60 (19): 158- 166. doi: 10.3778/j.issn.1002-8331.2307-0009
	WANG F Y , CHEN Y , YUAN S , et al. Robot path planning based on self-attention mechanism combined with DDPG. Computer Engineering and Application, 2024, 60 (19): 158- 166. doi: 10.3778/j.issn.1002-8331.2307-0009
21	刘国名, 李彩虹, 李永迪, 等. 基于改进PPO算法的机器人局部路径规划. 计算机工程, 2023, 49 (2): 119-126, 135. doi: 10.19678/j.issn.1000-3428.0063304
	LIU G M , LI C H , LI Y D , et al. Local path planning of robot based on improved PPO algorithm. Computer Engineering, 2023, 49 (2): 119-126, 135. doi: 10.19678/j.issn.1000-3428.0063304
22	杨思明, 单征, 丁煜, 等. 深度强化学习研究综述. 计算机工程, 2021, 47 (12): 19- 29. doi: 10.19678/j.issn.1000-3428.0061116
	YANG S M , SHAN Z , DING Y , et al. Survey of research on deep reinforcement learning. Computer Engineering, 2021, 47 (12): 19- 29. doi: 10.19678/j.issn.1000-3428.0061116
23	MOU Z Y , ZHANG Y , GAO F F , et al. Deep reinforcement learning based three-dimensional area coverage with UAV swarm. IEEE Journal on Selected Areas in Communications, 2021, 39 (10): 3160- 3176. URL
24	刘雪丽. 基于机器学习的船体除锈机器人曲面路径规划技术研究[D]. 镇江: 江苏科技大学, 2021.
	LIU X L. Research on surface path planning technology of hull derusting robot based on machine learning[D]. Zhenjiang: Jiangsu University of Science and Technology, 2021. (in Chinese)
25	JING W, POLDEN J, TAO P Y, et al. Model-based coverage motion planning for industrial 3D shape inspection applications[C]//Proceedings of the 13th IEEE Conference on Automation Science and Engineering. Washington D.C., USA: IEEE Press, 2017: 1293-1300.

[1]	李思源, 钟兴宇, 李凯茵, 徐清振. 基于多层图关系和强化学习的策略教学研究[J]. 计算机工程, 2025, 51(3): 122-130.
[2]	林绍福, 陈盈盈, 李硕朋. 基于深度强化学习的多无人机能量传输与边缘计算联合优化方法[J]. 计算机工程, 2025, 51(3): 144-154.
[3]	孙浩淼, 李宗民, 肖倩, 孙文洁, 张雯欣. AI-Curling: 一种冰壶现场分析与决策方法[J]. 计算机工程, 2025, 51(2): 102-110.
[4]	曾建州, 李泽平, 张素勤. 基于TD3算法的多智能体协作缓存策略[J]. 计算机工程, 2025, 51(2): 365-374.
[5]	石琼, 段辉, 师智斌. 基于深度强化学习的可信任务卸载方案[J]. 计算机工程, 2024, 50(8): 142-152.
[6]	钱清, 龙永, 蒋忠远, 段春红, 王宏. 基于深度强化学习的自适应图像隐写算法[J]. 计算机工程, 2024, 50(8): 319-327.
[7]	徐权, 冷珏琳, 刘田田, 郑澎. 面向复杂装配体模型的两级并行曲面网格生成[J]. 计算机工程, 2024, 50(6): 321-327.
[8]	高家豪, 胡创业, 丁男, 刘战东. 智能网联汽车中联合驾驶风格的交通流数据有效性分析[J]. 计算机工程, 2024, 50(6): 367-376.
[9]	孙文洁, 李宗民, 孙浩淼. 基于图神经网络的多智能体强化学习值函数分解方法[J]. 计算机工程, 2024, 50(5): 62-70.
[10]	傅明建, 郭福强. 基于深度强化学习的无信号灯路口决策研究[J]. 计算机工程, 2024, 50(5): 91-99.
[11]	张斯力, 李梓健, 蔡瑞初, 郝志峰, 闫玉光. 基于因果机制约束的强化推荐系统[J]. 计算机工程, 2024, 50(5): 279-290.
[12]	冯雄波, 黄于欣, 赖华, 高玉梦. 基于多策略强化学习的低资源跨语言摘要方法研究[J]. 计算机工程, 2024, 50(2): 68-77.
[13]	杜海军, 余粟. 基于时空图注意力网络的服务机器人动态避障[J]. 计算机工程, 2024, 50(2): 105-112.
[14]	倪苏婕, 陈兵, 石优. 一种联合V2I和V2V的任务卸载优化方案[J]. 计算机工程, 2024, 50(12): 174-183.
[15]	宋艳蕊, 庄雷, 徐泽汐, 冯旭, 莫文帅. 基于云边协同的可靠服务功能链部署算法[J]. 计算机工程, 2024, 50(12): 184-193.

选择文件类型/文献管理软件名称

选择包含的内容