作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 313-319. doi: 10.19678/j.issn.1000-3428.0066193

• 开发研究与工程应用 • 上一篇    下一篇

基于优势后见经验回放的强化学习导航方法

王少桐, 况立群*(), 韩慧妍, 熊风光, 薛红新   

  1. 中北大学计算机科学与技术学院, 山西 太原 030051
  • 收稿日期:2022-11-06 出版日期:2024-01-15 发布日期:2024-01-11
  • 通讯作者: 况立群
  • 基金资助:
    国家自然科学基金(62106238); 山西省回国留学人员科研项目(2020-113); 山西省科学成果转化引导专项(202104021301055)

Reinforcement Learning Navigation Method Based on Advantage Hindsight Experience Replay

Shaotong WANG, Liqun KUANG*(), Huiyan HAN, Fengguang XIONG, Hongxin XUE   

  1. School of Computer Science and Technology, North University of China, Taiyuan 030051, Shanxi, China
  • Received:2022-11-06 Online:2024-01-15 Published:2024-01-11
  • Contact: Liqun KUANG

摘要:

目前强化学习在移动机器人领域表现出了强大的潜力,将强化学习算法与机器人导航相结合,不需要依赖先验知识就可以实现移动机器人的自主导航,但是在机器人强化学习过程中存在样本利用率低且泛化能力不强的问题。针对上述问题,在D3QN算法的基础上提出优势后见经验回放算法用于经验样本的回放。首先计算轨迹样本中轨迹点的优势函数值,选择优势函数最大值的点作为目标点,然后对轨迹样本进行重新标记,将新旧轨迹样本一同放入经验池中增加经验样本的多样性,使智能体利用失败的经验样本学习,更高效地实现到目标点的导航。为评估该方法的有效性,基于Gazebo平台搭建不同的实验环境,并采用TurtleBot3机器人在仿真环境下进行导航训练与迁移测试,结果表明,该算法在训练环境下导航成功率高于当前主流算法,在迁移测试环境中导航成功率可达86.33%,能够有效提高导航样本利用率,降低导航策略学习难度,增强移动机器人在不同环境中的自主导航能力和迁移泛化能力。

关键词: 强化学习, 移动机器人, 后见经验回放, 神经网络, 样本利用率

Abstract:

Reinforcement learning demonstrates significant potential in the field of mobile robots. By combining reinforcement learning algorithms with robot navigation, the autonomous piloting of robots can be achieved without prior knowledge. However, robot reinforcement learning is associated with some disadvantages, such as low sample utilization ratios and poor generalization ability. Hence, based on the D3QN algorithm, this paper proposes an advantage hindsight experience replay algorithm for the playback of experience samples. First, the advantage function value of trajectory points in trajectory samples is calculated, and the point with the maximum advantage function is selected as the target point. Subsequently, the trajectory samples are relabeled, and the old and new trajectory samples are placed simultaneously into the experience pool to increase the diversity of experience samples, thus allowing the agent to learn to navigate to the target point more efficiently by learning the failed experience samples. To assess the validity of the proposed approach, different experimental environments are established using the Gazebo platform, and a TurtleBot3 robot is used to conduct navigation training and transfer tests in the simulation environment. The results show that the navigation success rate in the training environment is higher than that yielded by the current mainstream algorithm, and that the maximum navigation success rate achieved in the transfer test environment is 86.33%. Improving the algorithm can enhance the utilization ratio of navigation samples, reduce the difficulty of learning navigation strategies, and enhance the autonomous navigation ability and migration generalization ability of the robot in different environments.

Key words: reinforcement learning, mobile robots, hindsight experience replay, neural network, sample utilization