作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (3): 86-94. doi: 10.19678/j.issn.1000-3428.0068626

• 人工智能与模式识别 • 上一篇    下一篇

自适应奖励函数的PPO曲面覆盖方法

李淑怡1, 阳波2, 陈灵2, 沈玲1, 唐文胜1,*()   

  1. 1. 湖南师范大学信息科学与工程学院, 湖南 长沙 410081
    2. 湖南师范大学工程与设计学院, 湖南 长沙 410081
  • 收稿日期:2023-10-19 出版日期:2025-03-15 发布日期:2024-05-21
  • 通讯作者: 唐文胜
  • 基金资助:
    国家自然科学基金青年项目(62203167)

Surface Coverage Method Based on PPO with Adaptive Reward Function

LI Shuyi1, YANG Bo2, CHEN Ling2, SHEN Ling1, TANG Wensheng1,*()   

  1. 1. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, Hunan, China
    2. College of Engineering and Design, Hunan Normal University, Changsha 410081, Hunan, China
  • Received:2023-10-19 Online:2025-03-15 Published:2024-05-21
  • Contact: TANG Wensheng

摘要:

针对机器人清洁作业过程中现有曲面覆盖方法难以适应曲面变化且覆盖效率低的问题, 提出一种自适应奖励函数的近端策略优化(PPO)曲面覆盖方法(SC-SRPPO)。首先, 将目标曲面离散化, 以球查询方式获得协方差矩阵, 求解点云的法向量, 建立3D曲面模型; 其次, 以曲面局部点云的覆盖状态特征和曲率变化特征作为曲面模型观测值以构建状态模型, 有利于机器人移动轨迹拟合曲面, 提高机器人对曲面变化的适应能力; 接着, 基于曲面的全局覆盖率和与时间相关的指数模型构建一种自适应奖励函数, 引导机器人向未覆盖区域移动, 提高覆盖效率; 最后, 将曲面局部状态模型、奖励函数、PPO强化学习算法相融合, 训练机器人完成曲面覆盖路径规划任务。在球形、马鞍形、立体心形等3种曲面模型上, 以点云覆盖率与覆盖完成时间作为主要评价指标进行实验, 结果表明, SC-SRPPO的平均覆盖率为90.72%, 与NSGA Ⅱ、PPO、SAC这3种方法对比, 覆盖率分别提升4.98%、14.56%、27.11%, 覆盖完成时间分别缩短15.20%、67.18%、62.64%。SC-SRPPO能够在适应曲面变化的基础上使机器人更加高效地完成曲面覆盖任务。

关键词: 清洁机器人, 曲面, 覆盖路径规划, 强化学习, 近端策略优化

Abstract:

Existing surface coverage methods are difficult to adapt to surface changes, and their coverage efficiency in robot cleaning operations is low. This paper proposes a surface coverage method based on Proximal Policy Optimization (PPO), namely SC-SRPPO, with an adaptive reward function. First, the target surface is discretized and the covariance matrix is obtained via spherical query to solve the normal vector of the point cloud, which is then used to establish the 3D surface model. Second, a state model is constructed using the coverage state and curvature change features of the surface local point cloud as the observation value of the surface model, which guides the robot to fit the surface during movement and improves the adaptability of the robot to the surface. Subsequently, based on the global coverage of the surface and the time-related exponential model, an adaptive reward function is constructed to guide the robot to move to the uncovered area as soon as possible and improve coverage efficiency. Finally, the local state model and reward function of the surface are combined with the PPO algorithm to train the robot to complete surface coverage path planning. The average coverage rate on the sphere of SC-SRPPO was 90.72% for the hyperboloid and heart models. Comparing the NSGA Ⅱ, PPO, and SAC, the coverage rate increased by 4.98%, 14.56%, and 27.11%, respectively, while the coverage completion time was reduced by 15.20%, 67.18%, and 62.64%, respectively. The results show that SC-SRPPO can make the robot complete the surface-covering task more efficiently than NSGA Ⅱ and SAC by adapting to surface changes.

Key words: cleaning robot, surface, coverage path planning, Reinforcement Learning (RL), Proximal Policy Optimization (PPO)