基于改进DQN算法的机器人路径规划

doi:10.19678/j.issn.1000-3428.0066348

摘要/Abstract

摘要：

传统深度Q网络（DQN）算法通过融合深度神经网络和强化学习方法，解决了Q-learning算法在应对复杂环境时出现的维数灾难问题，被广泛应用于移动机器人的路径规划，但传统DQN算法的网络收敛速度较慢，路径规划效果较差，难以在较少的训练回合内获取最优路径。为了解决上述问题，提出一种改进的ERDQN算法。通过记录重复状态出现的频率，利用该频率重新计算Q值，使得在网络训练的过程中一种状态重复出现的次数越多，下一次出现该状态的概率越低，从而提高机器人对环境的探索能力，在一定程度上降低了网络收敛于局部最优的风险，减少了网络收敛的训练回合。根据机器人移动方向和机器人与目标点的距离，重新设计奖励函数。机器人在靠近目标点时能够获得正奖励，远离目标点时能够获得负奖励，并通过当前机器人的移动方向和机器人与目标点的距离调整奖励的绝对值，从而使机器人能够在避开障碍物的前提下规划出更优路径。实验结果表明，与DQN算法相比，ERDQN算法的平均得分提高了18.9%，规划出的路径长度和回合数减少了约20.1%和500。上述结果证明了ERDQN算法能够有效提高网络收敛速度及路径规划性能。

关键词: 深度Q网络算法, 路径规划, 深度强化学习, 状态探索, 奖励函数, 避障

Abstract:

The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

Key words: Deep Q Network(DQN) algorithm, path planning, Deep Reinforcement Learning(DQL), exploration of state, reward function, obstacle avoidance

李奇儒, 耿霞. 基于改进DQN算法的机器人路径规划[J]. 计算机工程, 2023, 49(12): 111-120.

Qiru LI, Xia GENG. Robot Path Planning Based on Improved DQN Algorithm[J]. Computer Engineering, 2023, 49(12): 111-120.

http://www.ecice06.com/CN/Y2023/V49/I12/111

图/表 22

图1 DQN算法流程

Fig.1 Procedure of the DQN algorithm

图2 θ角示意图

Fig.2 Diagram of the angle θ

图3 基于ERDQN的机器人路径规划流程

Fig.3 Procedure of robot path planning based on ERDQN

图4 机器人动作空间

Fig.4 Action space of the robot

图5 实验环境

Fig.5 Experimental environments

图6 测试环境

Fig.6 Testing environments

图7 网络模型结构

Fig.7 Network model structure

图8 4种算法的平均得分曲线图

Fig.8 Average score curve graph of four algorithms

图9 环境1中的4种算法路径规划线路

Fig.9 Path planning routes for four algorithms in environment 1

图10 DQN与ERDQN算法的平均奖励值

Fig.10 Average reward values between DQN and ERDQN algorithms

图11 环境2中3种算法的路径规划线路

Fig.11 Path planning routes for three algorithms in environment 2

图12 2种算法的平均得分曲线图

Fig.12 Average score curve graph of two algorithms

图13 环境2中2种算法的路径规划线路

Fig.13 Path planning routes for two algorithms in environment 2

参考文献 27

1	ZHOU C H, GU S D, WEN Y Q, et al. The review unmanned surface vehicle path planning: based on multi-modality constraint. Ocean Engineering, 2020, 200, 107043. doi: 10.1016/j.oceaneng.2020.107043
2	张伟民, 张月, 张辉. 基于改进A*算法的煤矿救援机器人路径规划. 煤田地质与勘探, 2022, 50 (12): 185- 193. URL
	ZHANG W M, ZHANG Y, ZHANG H. Path planning of coal mine rescue robot based on improved A* algorithm. Coal Geology & Exploration, 2022, 50 (12): 185- 193. URL
3	陈丹凤, 雷昊, 刘俊朗, 等. 基于强化蚁群算法的机器人路径规划研究. 兵器装备工程学报, 2023, 44 (6): 239-245, 303. URL
	CHEN D F, LEI H, LIU J L, et al. Research on robot path planning based on reinforced ant colony optimization. Journal of Ordnance Equipment Engineering, 2023, 44 (6): 239-245, 303. URL
4	裴莹, 苏山, 付加胜, 等. 一种求解复杂优化问题的快速遗传算法算子. 吉林大学学报(理学版), 2021, 59 (3): 602- 608. URL
	PEI Y, SU S, FU J S, et al. A fast genetic algorithm operator for solving complex optimization problems. Journal of Jilin University(Science Edition), 2021, 59 (3): 602- 608. URL
5	杨思明, 单征, 曹江, 等. 基于模型的强化学习在无人机路径规划中的应用. 计算机工程, 2022, 48 (12): 255-260, 269. URL
	YANG S M, SHAN Z, CAO J, et al. Application of model-based reinforcement learning in UAV path planning. Computer Engineering, 2022, 48 (12): 255-260, 269. URL
6	MATSUO Y, LECUN Y, SAHANI M, et al. Deep learning, reinforcement learning, and world models. Neural Networks, 2022, 152, 267- 275. doi: 10.1016/j.neunet.2022.03.037
7	刘潇, 刘书洋, 庄韫恺, 等. 强化学习可解释性基础问题探索和方法综述. 软件学报, 2023, 34 (5): 2300- 2316. URL
	LIU X, LIU S Y, ZHUANG Y K, et al. Explainable reinforcement learning: basic problems exploration and method survey. Journal of Software, 2023, 34 (5): 2300- 2316. URL
8	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. [2022-10-14]. http://arxiv.org/pdf/1312.5602.
9	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[EB/OL]. [2022-10-14]. http://arxiv.org/abs/1509.06461v3.
10	ZHAO J D, GAN Z G, LIANG J K, et al. Path planning research of a UAV base station searching for disaster victims' location information based on deep reinforcement learning. Entropy, 2022, 24 (12): 1767.
11	郑帅, 罗飞, 顾春华, 等. 基于双估计器的改进Speedy Q-learning算法. 计算机科学, 2020, 47 (7): 179- 185. URL
	ZHENG S, LUO F, GU C H, et al. Improved Speedy Q-learning algorithm based on double estimator. Computer Science, 2020, 47 (7): 179- 185. URL
12	LIU B Y, YE X B, ZHOU C F, et al. The improved algorithm of deep Q-learning network based on eligibility trace[C]//Proceedings of the 6th International Conference on Control, Automation and Robotics. Washington D. C., USA: IEEE Press, 2020: 230-235.
13	HU R J, ZHANG Y L. Fast path planning for long-range planetary roving based on a hierarchical framework and deep reinforcement learning. Aerospace, 2022, 9 (2): 101.
14	刘全, 闫岩, 朱斐, 等. 一种带探索噪音的深度循环Q网络. 计算机学报, 2019, 42 (7): 1588- 1604. URL
	LIU Q, YAN Y, ZHU F, et al. A deep recurrent Q network with exploratory noise. Chinese Journal of Computers, 2019, 42 (7): 1588- 1604. URL
15	LU J J, LIU W X, ZHU Y H, et al. Scheduling mix-flow in SD-DCN based on deep reinforcement learning with private link[C]//Proceedings of the 16th International Conference on Mobility, Sensing and Networking. Washington D. C., USA: IEEE Press, 2021: 395-401.
16	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. [2022-10-14]. https://www.docin.com/p-1773253963.html.
17	赵英男, 刘鹏, 赵巍, 等. 深度Q学习的二次主动采样方法. 自动化学报, 2019, 45 (10): 1870- 1882. URL
	ZHAO Y N, LIU P, ZHAO W, et al. Twice sampling method in deep Q-network. Acta Automatica Sinica, 2019, 45 (10): 1870- 1882. URL
18	LÜ L H, ZHANG S J, DING D R, et al. Path planning via an improved DQN-based learning policy. IEEE Access, 2019, 7, 67319- 67330.
19	LIU Y H, XU Y Z. Free gait planning of hexapod robot based on improved DQN algorithm[C]//Proceedings of the 2nd International Conference on Civil Aviation Safety and Information Technology. Washington D. C., USA: IEEE Press, 2021: 488-491.
20	LI J X, CHEN Y T, ZHAO X N, et al. An improved DQN path planning algorithm. The Journal of Supercomputing, 2022, 78 (1): 616- 639.
21	LIU Y L, CHEN Z G, LI Y G, et al. Robot search path planning method based on prioritized deep reinforcement learning. International Journal of Control, Automation and Systems, 2022, 20 (8): 2669- 2680.
22	ZHANG Y, WANG T B. Applying value-based deep reinforcement learning on KPI time series anomaly detection[C]//Proceedings of the 15th International Conference on Cloud Computing. Washington D. C., USA: IEEE Press, 2022: 197-202.
23	刘全, 翟建伟, 章宗长, 等. 深度强化学习综述. 计算机学报, 2018, 41 (1): 1- 27. URL
	LIU Q, ZHAI J W, ZHANG Z C, et al. A survey on deep reinforcement learning. Chinese Journal of Computers, 2018, 41 (1): 1- 27. URL
24	马昂, 于艳华, 杨胜利, 等. 基于强化学习的知识图谱综述. 计算机研究与发展, 2022, 59 (8): 1694- 1722. URL
	MA A, YU Y H, YANG S L, et al. Summary of knowledge map based on reinforcement learning. Journal of Computer Research and Development, 2022, 59 (8): 1694- 1722. URL
25	LIN S W, LIU A, WANG J G, et al. A review of path-planning approaches for multiple mobile robots. Machines, 2022, 10 (9): 773.
26	董永峰, 杨琛, 董瑶, 等. 基于改进的DQN机器人路径规划. 计算机工程与设计, 2021, 42 (2): 552- 558. URL
	DONG Y F, YANG C, DONG Y, et al. Robot path planning based on improved DQN. Computer Engineering and Design, 2021, 42 (2): 552- 558. URL
27	LEE S M, KIM S B. Parallel simulated annealing with a greedy algorithm for Bayesian network structure learning. IEEE Transactions on Knowledge and Data Engineering, 2020, 32 (6): 1157- 1166.

[1]	孔凌辉, 饶哲恒, 徐彦彦, 潘少明. 基于深度强化学习的无线网络智能路由算法[J]. 计算机工程, 2023, 49(9): 199-207, 216.
[2]	胡水. 基于深度强化学习的智能兵棋推演决策方法[J]. 计算机工程, 2023, 49(9): 303-312.
[3]	张冠莹, 伊鹏, 李丹, 朱棣, 毛明. 面向大规模网络的服务功能链部署方法[J]. 计算机工程, 2023, 49(8): 122-129.
[4]	梅晶, 戴龙宝, 童钊, 邓昕, 王嘉珂. 资源约束下基于Lyapunov优化的自适应卸载算法[J]. 计算机工程, 2023, 49(7): 34-46.
[5]	蔡丽娇, 秦进, 陈双. 远离旧区域和避免回路的强化探索方法[J]. 计算机工程, 2023, 49(7): 118-124.
[6]	李强, 仪晋辉, 杜婷婷, 王胜春. 移动边缘计算中基于A3C的依赖任务卸载与资源分配[J]. 计算机工程, 2023, 49(6): 42-52.
[7]	饶东宁, 罗南岳. 基于多任务强化学习的堆垛机调度与库位推荐[J]. 计算机工程, 2023, 49(2): 279-287,295.
[8]	刘国名, 李彩虹, 李永迪, 张国胜, 张耀玉, 高腾腾. 基于改进PPO算法的机器人局部路径规划[J]. 计算机工程, 2023, 49(2): 119-126,135.
[9]	岳荣康, 丁行, 江海, 龙吟. 基于互斥锁传播的多智能体路径规划算法[J]. 计算机工程, 2023, 49(12): 103-110.
[10]	宋健, 王子磊. 基于值分解的多目标多智能体深度强化学习方法[J]. 计算机工程, 2023, 49(1): 31-40.
[11]	詹京吴, 黄宜庆. 融合安全A^*算法与动态窗口法的机器人路径规划[J]. 计算机工程, 2022, 48(9): 105-112,120.
[12]	马华伟, 马凯, 郭君. 考虑多投递的带无人机车辆路径规划问题研究[J]. 计算机工程, 2022, 48(8): 299-305.
[13]	赵寅甫, 冯正勇. 基于深度强化学习的机械臂控制快速训练方法[J]. 计算机工程, 2022, 48(8): 113-120.
[14]	黄金瑶, 刘同来, 吴嘉鑫, 武继刚. 多周期家庭护理的路径规划与调度算法[J]. 计算机工程, 2022, 48(7): 292-299.
[15]	李冠达, 金兢, 王凡, 夏营威, 杨学志. 室内场景下应用拓扑结构的高效路径规划算法[J]. 计算机工程, 2022, 48(6): 95-106.

选择文件类型/文献管理软件名称

选择包含的内容