作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (12): 111-120. doi: 10.19678/j.issn.1000-3428.0066348

• 人工智能与模式识别 • 上一篇    下一篇

基于改进DQN算法的机器人路径规划

李奇儒, 耿霞   

  1. 江苏大学 计算机科学与通信工程学院, 江苏 镇江 212000
  • 收稿日期:2022-11-24 出版日期:2023-12-15 发布日期:2023-12-14
  • 作者简介:

    李奇儒(2003-), 男, 本科生, 主研方向为人工智能、模式识别

    耿霞, 副教授、博士

  • 基金资助:
    国家自然科学基金(62276116)

Robot Path Planning Based on Improved DQN Algorithm

Qiru LI, Xia GENG   

  1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212000, Jiangsu, China
  • Received:2022-11-24 Online:2023-12-15 Published:2023-12-14

摘要:

传统深度Q网络(DQN)算法通过融合深度神经网络和强化学习方法,解决了Q-learning算法在应对复杂环境时出现的维数灾难问题,被广泛应用于移动机器人的路径规划,但传统DQN算法的网络收敛速度较慢,路径规划效果较差,难以在较少的训练回合内获取最优路径。为了解决上述问题,提出一种改进的ERDQN算法。通过记录重复状态出现的频率,利用该频率重新计算Q值,使得在网络训练的过程中一种状态重复出现的次数越多,下一次出现该状态的概率越低,从而提高机器人对环境的探索能力,在一定程度上降低了网络收敛于局部最优的风险,减少了网络收敛的训练回合。根据机器人移动方向和机器人与目标点的距离,重新设计奖励函数。机器人在靠近目标点时能够获得正奖励,远离目标点时能够获得负奖励,并通过当前机器人的移动方向和机器人与目标点的距离调整奖励的绝对值,从而使机器人能够在避开障碍物的前提下规划出更优路径。实验结果表明,与DQN算法相比,ERDQN算法的平均得分提高了18.9%,规划出的路径长度和回合数减少了约20.1%和500。上述结果证明了ERDQN算法能够有效提高网络收敛速度及路径规划性能。

关键词: 深度Q网络算法, 路径规划, 深度强化学习, 状态探索, 奖励函数, 避障

Abstract:

The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

Key words: Deep Q Network(DQN) algorithm, path planning, Deep Reinforcement Learning(DQL), exploration of state, reward function, obstacle avoidance