Robot Path Planning Based on Improved DQN Algorithm

doi:10.19678/j.issn.1000-3428.0066348

Abstract

Abstract:

The traditional Deep Q Network(DQN) algorithm solves the dimensionality problem of Q-learning algorithms in complex environments by integrating deep neural networks and reinforcement learning methods that are widely used in the path planning of mobile robots. However, the traditional DQN algorithm has a low network convergence speed and poor path planning effect, and consequently, obtaining the optimal path in a short training round is challenging. To solve these problems, an improved ERDQN algorithm is proposed. The Q value is recalculated by recording the frequency of the repeated states. The more times a state is repeated in the process of network training, the lower the probability of the next occurrence of the state. This phenomenon can improve the robot's ability to explore the environment, reduce the risk of network convergence to the local optima to a certain extent, and reduce the number of training rounds required for network convergence. The reward function is redesigned according to the moving direction of the robot, and the distance between the robot and target point. The robot obtains a positive reward when it is close to the target point and a negative reward when it is far from the target point. The absolute value of the reward is adjusted according to the current moving direction of the robot, and the distance between the robot and target point; thus, the robot can plan a better path while avoiding obstacles. The experimental results show that compared with the DQN algorithm, the average score of the ERDQN algorithm is increased by 18.9%, whereas the path length and number of planned rounds reduced by approximately 20.1% and 500, respectively. These results prove that the ERDQN algorithm can effectively improve network convergence speed and path planning performance.

Key words: Deep Q Network(DQN) algorithm, path planning, Deep Reinforcement Learning(DQL), exploration of state, reward function, obstacle avoidance

摘要：

传统深度Q网络（DQN）算法通过融合深度神经网络和强化学习方法，解决了Q-learning算法在应对复杂环境时出现的维数灾难问题，被广泛应用于移动机器人的路径规划，但传统DQN算法的网络收敛速度较慢，路径规划效果较差，难以在较少的训练回合内获取最优路径。为了解决上述问题，提出一种改进的ERDQN算法。通过记录重复状态出现的频率，利用该频率重新计算Q值，使得在网络训练的过程中一种状态重复出现的次数越多，下一次出现该状态的概率越低，从而提高机器人对环境的探索能力，在一定程度上降低了网络收敛于局部最优的风险，减少了网络收敛的训练回合。根据机器人移动方向和机器人与目标点的距离，重新设计奖励函数。机器人在靠近目标点时能够获得正奖励，远离目标点时能够获得负奖励，并通过当前机器人的移动方向和机器人与目标点的距离调整奖励的绝对值，从而使机器人能够在避开障碍物的前提下规划出更优路径。实验结果表明，与DQN算法相比，ERDQN算法的平均得分提高了18.9%，规划出的路径长度和回合数减少了约20.1%和500。上述结果证明了ERDQN算法能够有效提高网络收敛速度及路径规划性能。

关键词: 深度Q网络算法, 路径规划, 深度强化学习, 状态探索, 奖励函数, 避障

Qiru LI, Xia GENG. Robot Path Planning Based on Improved DQN Algorithm[J]. Computer Engineering, 2023, 49(12): 111-120.

李奇儒, 耿霞. 基于改进DQN算法的机器人路径规划[J]. 计算机工程, 2023, 49(12): 111-120.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0066348

http://www.ecice06.com/EN/Y2023/V49/I12/111

Figures/Tables 22

Fig.1 Procedure of the DQN algorithm

Fig.2 Diagram of the angle θ

Fig.3 Procedure of robot path planning based on ERDQN

Fig.4 Action space of the robot

Fig.5 Experimental environments

Fig.6 Testing environments

Fig.7 Network model structure

Fig.8 Average score curve graph of four algorithms

Fig.9 Path planning routes for four algorithms in environment 1

Fig.10 Average reward values between DQN and ERDQN algorithms

Fig.11 Path planning routes for three algorithms in environment 2

Fig.12 Average score curve graph of two algorithms

Fig.13 Path planning routes for two algorithms in environment 2

References 27

1	ZHOU C H, GU S D, WEN Y Q, et al. The review unmanned surface vehicle path planning: based on multi-modality constraint. Ocean Engineering, 2020, 200, 107043. doi: 10.1016/j.oceaneng.2020.107043
2	张伟民, 张月, 张辉. 基于改进A*算法的煤矿救援机器人路径规划. 煤田地质与勘探, 2022, 50 (12): 185- 193. URL
	ZHANG W M, ZHANG Y, ZHANG H. Path planning of coal mine rescue robot based on improved A* algorithm. Coal Geology & Exploration, 2022, 50 (12): 185- 193. URL
3	陈丹凤, 雷昊, 刘俊朗, 等. 基于强化蚁群算法的机器人路径规划研究. 兵器装备工程学报, 2023, 44 (6): 239-245, 303. URL
	CHEN D F, LEI H, LIU J L, et al. Research on robot path planning based on reinforced ant colony optimization. Journal of Ordnance Equipment Engineering, 2023, 44 (6): 239-245, 303. URL
4	裴莹, 苏山, 付加胜, 等. 一种求解复杂优化问题的快速遗传算法算子. 吉林大学学报(理学版), 2021, 59 (3): 602- 608. URL
	PEI Y, SU S, FU J S, et al. A fast genetic algorithm operator for solving complex optimization problems. Journal of Jilin University(Science Edition), 2021, 59 (3): 602- 608. URL
5	杨思明, 单征, 曹江, 等. 基于模型的强化学习在无人机路径规划中的应用. 计算机工程, 2022, 48 (12): 255-260, 269. URL
	YANG S M, SHAN Z, CAO J, et al. Application of model-based reinforcement learning in UAV path planning. Computer Engineering, 2022, 48 (12): 255-260, 269. URL
6	MATSUO Y, LECUN Y, SAHANI M, et al. Deep learning, reinforcement learning, and world models. Neural Networks, 2022, 152, 267- 275. doi: 10.1016/j.neunet.2022.03.037
7	刘潇, 刘书洋, 庄韫恺, 等. 强化学习可解释性基础问题探索和方法综述. 软件学报, 2023, 34 (5): 2300- 2316. URL
	LIU X, LIU S Y, ZHUANG Y K, et al. Explainable reinforcement learning: basic problems exploration and method survey. Journal of Software, 2023, 34 (5): 2300- 2316. URL
8	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. [2022-10-14]. http://arxiv.org/pdf/1312.5602.
9	VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[EB/OL]. [2022-10-14]. http://arxiv.org/abs/1509.06461v3.
10	ZHAO J D, GAN Z G, LIANG J K, et al. Path planning research of a UAV base station searching for disaster victims' location information based on deep reinforcement learning. Entropy, 2022, 24 (12): 1767.
11	郑帅, 罗飞, 顾春华, 等. 基于双估计器的改进Speedy Q-learning算法. 计算机科学, 2020, 47 (7): 179- 185. URL
	ZHENG S, LUO F, GU C H, et al. Improved Speedy Q-learning algorithm based on double estimator. Computer Science, 2020, 47 (7): 179- 185. URL
12	LIU B Y, YE X B, ZHOU C F, et al. The improved algorithm of deep Q-learning network based on eligibility trace[C]//Proceedings of the 6th International Conference on Control, Automation and Robotics. Washington D. C., USA: IEEE Press, 2020: 230-235.
13	HU R J, ZHANG Y L. Fast path planning for long-range planetary roving based on a hierarchical framework and deep reinforcement learning. Aerospace, 2022, 9 (2): 101.
14	刘全, 闫岩, 朱斐, 等. 一种带探索噪音的深度循环Q网络. 计算机学报, 2019, 42 (7): 1588- 1604. URL
	LIU Q, YAN Y, ZHU F, et al. A deep recurrent Q network with exploratory noise. Chinese Journal of Computers, 2019, 42 (7): 1588- 1604. URL
15	LU J J, LIU W X, ZHU Y H, et al. Scheduling mix-flow in SD-DCN based on deep reinforcement learning with private link[C]//Proceedings of the 16th International Conference on Mobility, Sensing and Networking. Washington D. C., USA: IEEE Press, 2021: 395-401.
16	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. [2022-10-14]. https://www.docin.com/p-1773253963.html.
17	赵英男, 刘鹏, 赵巍, 等. 深度Q学习的二次主动采样方法. 自动化学报, 2019, 45 (10): 1870- 1882. URL
	ZHAO Y N, LIU P, ZHAO W, et al. Twice sampling method in deep Q-network. Acta Automatica Sinica, 2019, 45 (10): 1870- 1882. URL
18	LÜ L H, ZHANG S J, DING D R, et al. Path planning via an improved DQN-based learning policy. IEEE Access, 2019, 7, 67319- 67330.
19	LIU Y H, XU Y Z. Free gait planning of hexapod robot based on improved DQN algorithm[C]//Proceedings of the 2nd International Conference on Civil Aviation Safety and Information Technology. Washington D. C., USA: IEEE Press, 2021: 488-491.
20	LI J X, CHEN Y T, ZHAO X N, et al. An improved DQN path planning algorithm. The Journal of Supercomputing, 2022, 78 (1): 616- 639.
21	LIU Y L, CHEN Z G, LI Y G, et al. Robot search path planning method based on prioritized deep reinforcement learning. International Journal of Control, Automation and Systems, 2022, 20 (8): 2669- 2680.
22	ZHANG Y, WANG T B. Applying value-based deep reinforcement learning on KPI time series anomaly detection[C]//Proceedings of the 15th International Conference on Cloud Computing. Washington D. C., USA: IEEE Press, 2022: 197-202.
23	刘全, 翟建伟, 章宗长, 等. 深度强化学习综述. 计算机学报, 2018, 41 (1): 1- 27. URL
	LIU Q, ZHAI J W, ZHANG Z C, et al. A survey on deep reinforcement learning. Chinese Journal of Computers, 2018, 41 (1): 1- 27. URL
24	马昂, 于艳华, 杨胜利, 等. 基于强化学习的知识图谱综述. 计算机研究与发展, 2022, 59 (8): 1694- 1722. URL
	MA A, YU Y H, YANG S L, et al. Summary of knowledge map based on reinforcement learning. Journal of Computer Research and Development, 2022, 59 (8): 1694- 1722. URL
25	LIN S W, LIU A, WANG J G, et al. A review of path-planning approaches for multiple mobile robots. Machines, 2022, 10 (9): 773.
26	董永峰, 杨琛, 董瑶, 等. 基于改进的DQN机器人路径规划. 计算机工程与设计, 2021, 42 (2): 552- 558. URL
	DONG Y F, YANG C, DONG Y, et al. Robot path planning based on improved DQN. Computer Engineering and Design, 2021, 42 (2): 552- 558. URL
27	LEE S M, KIM S B. Parallel simulated annealing with a greedy algorithm for Bayesian network structure learning. IEEE Transactions on Knowledge and Data Engineering, 2020, 32 (6): 1157- 1166.

[1]	LIU Guoming, LI Caihong, LI Yongdi, ZHANG Guosheng, ZHANG Yaoyu, GAO Tengteng. Local Path Planning of Robot Based on Improved PPO Algorithm [J]. Computer Engineering, 2023, 49(2): 119-126,135.
[2]	ZHAN Jingwu, HUANG Yiqing. Path Planning of Robot Combing Safety A^* Algorithm and Dynamic Window Approach [J]. Computer Engineering, 2022, 48(9): 105-112,120.
[3]	MA Huawei, MA Kai, GUO Jun. Research on Vehicle Routing Problem with Drones Considering Multi-Delivery [J]. Computer Engineering, 2022, 48(8): 299-305.
[4]	LI Guanda, JIN Jing, WANG Fan, XIA Yingwei, YANG Xuezhi. Efficient Path Planning Algorithm Using Topology for Indoor Environment [J]. Computer Engineering, 2022, 48(6): 95-106.
[5]	YANG Siming, SHAN Zheng, CAO Jiang, GUO Jiayu, GAO Yuan, GUO Yang, WANG Ping, WANG Jing, WANG Xiaonan. Application of Model-Based Reinforcement Learning in Path Planning of Unmanned Aerial Vehicle [J]. Computer Engineering, 2022, 48(12): 255-260,269.
[6]	GAO Wanbo, ZHU Junwu, ZHANG Yonglong, ZHANG Xiaowei. Unmanned Vehicle Path Planning Based on Selection Crossover Fireworks Algorithm [J]. Computer Engineering, 2022, 48(11): 314-320.
[7]	HUANG Yifan, HU Likun, XUE Wenchao. Mobile Robot Path Planning Based on Improved RRT-Connect Algorithm [J]. Computer Engineering, 2021, 47(8): 22-28.
[8]	CHEN Zihan, YE Jin, XIAO Qingyu. An Adaptive Bitrate Algorithm Based on Video Classification [J]. Computer Engineering, 2021, 47(12): 118-121,130.
[9]	YAN Jiaojie, ZHANG Qieshi, HU Xiping. Review of Path Planning Techniques Based on Reinforcement Learning [J]. Computer Engineering, 2021, 47(10): 16-25.
[10]	LIU Qingzhou, WU Feng. Research Progress of Multi-Agent Path Planning [J]. Computer Engineering, 2020, 46(4): 1-10.
[11]	PEI Yijian,YANG Chaojie,YANG Liangliang. Path planning algorithm for mobile robot based on improved RRT* [J]. Computer Engineering, 2019, 45(5): 285-290,297.
[12]	HU Zhangfang,SUN Lin,ZHANG Yi,BAO Hezhang. A Robot Path Planning Algorithm Based on Improved QPSO [J]. Computer Engineering, 2019, 45(4): 281-287.
[13]	LIU Chang, XIE Wenjun, ZHANG Peng, GUO Qing, GAO Chao. Mission Planning for Multi-base Multi-UAV Obstacle Avoidance [J]. Computer Engineering, 2019, 45(11): 275-280.
[14]	XIONG Chao,XIE Wujie,DONG Wenhan. Obstacle Avoidance Path Planning for UAV Based on Artificial Potential Field Improved by Collision Cone [J]. Computer Engineering, 2018, 44(9): 314-320.
[15]	LI Junfeng,LI Taochang,PENG Jishen. Research on Path Recognition Method of Agricultural Robot Vision Navigation [J]. Computer Engineering, 2018, 44(9): 38-44,58.

Please choose a citation manager

Content to export