多约束复杂环境下UAV航迹规划策略自学习方法

doi:10.19678/j.issn.1000-3428.0057492

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 44-51. doi: 10.19678/j.issn.1000-3428.0057492

多约束复杂环境下UAV航迹规划策略自学习方法

邱月, 郑柏通, 蔡超

华中科技大学人工智能与自动化学院多谱信息处理技术国家级重点实验室, 武汉 430074

收稿日期:2020-02-25 修回日期:2020-04-28 发布日期:2020-05-08
作者简介:邱月(1994-),女,硕士研究生,主研方向为任务规划、计算机视觉;郑柏通,硕士研究生;蔡超,副教授、博士。
基金资助:
江苏省自然科学基金（BK20170914）。

Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints

QIU Yue, ZHENG Baitong, CAI Chao

National Key Laboratory for Multi-Spectral Information Processing Technologies, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China

Received:2020-02-25 Revised:2020-04-28 Published:2020-05-08

摘要/Abstract

摘要： 在多约束复杂环境下，多数无人飞行器（UAV）航迹规划方法无法从历史经验中获得先验知识，导致对多变的环境适应性较差。提出一种基于深度强化学习的航迹规划策略自学习方法，利用飞行约束条件设计UAV的状态及动作模式，从搜索宽度和深度2个方面降低航迹规划搜索规模，基于航迹优化目标设计奖惩函数，利用由卷积神经网络引导的蒙特卡洛树搜索（MCTS）算法学习得到航迹规划策略。仿真结果表明，该方法自学习得到的航迹规划策略具有泛化能力，相对未迭代训练的网络，该策略仅需17%的NN-MCTS仿真次数就可引导UAV在未知飞行环境中满足约束条件并安全无碰撞地到达目的地。

关键词: 深度强化学习, 蒙特卡洛树搜索, 航迹规划策略, 策略自学习, 多约束, 复杂环境

Abstract: In a complex multi-constrained environment,the Unmanned Aerial Vehicle(UAV) track planning methods generally fail to obtain priori knowledge from historical experience,resulting in poor adaptability to a variable environment.To address the problem,this paper proposes a self-learning method for track planning strategy based on deep reinforcement learning.Based on the UAV flight constraints,the design of the UAV state and action modes is optimized to reduce the width and depth of track planning search.The reward and punishment function is designed based on the track optimization objective.Then,a Monte Carlo Tree Search(MCTS) algorithm guided by a convolutional neural network is used to learn the track planning strategy.Simulation results show that the track planning strategy obtained by the proposed self-learning method has generalization ability.Compared with the networks without iterative training,the strategy obtained by this method requires only 17% of the number of NN-MCTS simulation times to guide the UAV to reach the destination safely without collision and satisfy the constraints in an unknown environment.

Key words: deep reinforcement learning, Monte Carlo Tree Search(MCTS), track planning strategy, strategy self-learning, multiple constraints, complex environment

中图分类号:

TP242.6

邱月, 郑柏通, 蔡超. 多约束复杂环境下UAV航迹规划策略自学习方法[J]. 计算机工程, 2021, 47(5): 44-51.

QIU Yue, ZHENG Baitong, CAI Chao. Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints[J]. Computer Engineering, 2021, 47(5): 44-51.

https://www.ecice06.com/CN/Y2021/V47/I5/44

参考文献

[1] ZHAN Weiwei,WANG Wei,CHEN Nengcheng,et al.A UAV trajectory planning using improved A* algorithm[J].Geomatics and Information Science of Wuhan University,2015,40(3):315-320.(in Chinese)占伟伟,王伟,陈能成,等.一种利用改进A*算法的无人机航迹规划[J].武汉大学学报(信息科学版),2015,40(3):315-320.
[2] LI Nan,LIU Peng,DENG Renbo,et al.Three dimensional path planning for unmanned aerial vehicles based on improved genetic algorithm[J].Computer Simulation,2017,34(12):22-25,35.(in Chinese)李楠,刘朋,邓人博,等.基于改进遗传算法的无人机三维航路规划[J].计算机仿真,2017,34(12):22-25,35.
[3] GE Yan,SHUI Wei,HAN Yu,et al.Route optimization based on Bayesian network and ant colony algorithm[J].Computer Engineering,2009,35(12):175-177.(in Chinese)葛艳,税薇,韩玉,等.基于贝叶斯网络和蚁群算法的航路优化[J].计算机工程,2009,35(12):175-177.
[4] FANG Qun,XU Qing.3D route planning for UAV based on improved PSO algorithm[J].Journal of Northwestern Polytechnical University,2017,35(1):66-73.(in Chinese)方群,徐青.基于改进粒子群算法的无人机三维航迹规划[J].西北工业大学学报,2017,35(1):66-73.
[5] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.
[6] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[EB/OL].[2020-01-23].https://arxiv.org/pdf/1312.5602v1.pdf.
[7] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning.Washington D.C.,USA:IEEE Press,2016:1928-1937.
[8] JARADAT M A K,AL-ROUSAN M,QUADAN L.Reinforcement based mobile robot navigation in dynamic environment[J].Robotics and Computer Integrated Manufacturing,2011,27(1):135-149.
[9] ZHU Y,MOTTAGHI R,KOLVE E,et al.Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C.,USA:IEEE Press,2017:3357-3364.
[10] TAI L,LIU M.Towards cognitive exploration through deep reinforcement learning for mobile robots[EB/OL].[2020-01-23].https://arxiv.org/pdf/1610.01733.pdf.
[11] TAI L,PAOLO G,LIU M.Virtual-to-real deep reinforce-ment learning:continuous control of mobile robots for mapless navigation[EB/OL].[2020-01-23].https://arxiv.org/pdf/1703.00420.pdf.
[12] DING Mingyue,ZHENG Changwen,ZHOU Chengping,et al.UAV flight path planning[M].Beijing:Publishing House of Electronics Industry,2009.(in Chinese)丁明跃,郑昌文,周程平,等.无人飞行器航迹规划[M].北京:电子工业出版社,2009.
[13] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529(7587):484-489.
[14] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550(7676):354-359.
[15] IOFFE S,SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[EB/OL].[2020-01-23].https://arxiv.org/pdf/1502.03167.pdf.
[16] HAHNLOSER R H R,SARPESHKAR R,MAHOWALD M A,et al.Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit[J].Nature,2000,405(6789):947-951.
[17] HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:4700-4708.
[18] KAUFMAN H,HOWARD R A.Dynamic programming and Markov processes[J].The American Mathematical Monthly,1961,68(2):194-201.
[19] SUTTON R S,BARTO A G.Reinforcement learning:an introduction[J].IEEE Transactions on Neural Networks,1998,9(5):1054-1068.
[20] KINGMA D P,BA J.Adam:a method for stochastic optimization[EB/OL].[2020-01-23].https://arxiv.org/pdf/1412.6980.pdf.
[21] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithms[EB/OL].[2020-01-23].https://arxiv.org/pdf/1707.06347.pdf.
[22] SCHULMAN J,LEVINE S,ABBEEL P,et al.Trust region policy optimization[EB/OL].[2020-01-23].https://arxiv.org/pdf/1502.05477.pdf.

选择文件类型/文献管理软件名称

选择包含的内容

多约束复杂环境下UAV航迹规划策略自学习方法

Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王志特, 罗丽平, 廖义奎. 改进A^*算法融合改进动态窗口法的移动机器人路径规划[J]. 计算机工程, 2024, 50(8): 86-101.
[2]	石琼, 段辉, 师智斌. 基于深度强化学习的可信任务卸载方案[J]. 计算机工程, 2024, 50(8): 142-152.
[3]	傅明建, 郭福强. 基于深度强化学习的无信号灯路口决策研究[J]. 计算机工程, 2024, 50(5): 91-99.
[4]	孙文洁, 李宗民, 孙浩淼. 基于图神经网络的多智能体强化学习值函数分解方法[J]. 计算机工程, 2024, 50(5): 62-70.
[5]	杜海军, 余粟. 基于时空图注意力网络的服务机器人动态避障[J]. 计算机工程, 2024, 50(2): 105-112.
[6]	蔡梓越, 谭北海, 余荣, 黄旭民, 王思明. 面向6G物联网设备协同的区块链动态分片[J]. 计算机工程, 2024, 50(1): 50-59.
[7]	胡水. 基于深度强化学习的智能兵棋推演决策方法[J]. 计算机工程, 2023, 49(9): 303-312.
[8]	孔凌辉, 饶哲恒, 徐彦彦, 潘少明. 基于深度强化学习的无线网络智能路由算法[J]. 计算机工程, 2023, 49(9): 199-207, 216.
[9]	张冠莹, 伊鹏, 李丹, 朱棣, 毛明. 面向大规模网络的服务功能链部署方法[J]. 计算机工程, 2023, 49(8): 122-129.
[10]	蔡丽娇, 秦进, 陈双. 远离旧区域和避免回路的强化探索方法[J]. 计算机工程, 2023, 49(7): 118-124.
[11]	梅晶, 戴龙宝, 童钊, 邓昕, 王嘉珂. 资源约束下基于Lyapunov优化的自适应卸载算法[J]. 计算机工程, 2023, 49(7): 34-46.
[12]	李强, 仪晋辉, 杜婷婷, 王胜春. 移动边缘计算中基于A3C的依赖任务卸载与资源分配[J]. 计算机工程, 2023, 49(6): 42-52.
[13]	饶东宁, 罗南岳. 基于多任务强化学习的堆垛机调度与库位推荐[J]. 计算机工程, 2023, 49(2): 279-287,295.
[14]	李奇儒, 耿霞. 基于改进DQN算法的机器人路径规划[J]. 计算机工程, 2023, 49(12): 111-120.
[15]	宋健, 王子磊. 基于值分解的多目标多智能体深度强化学习方法[J]. 计算机工程, 2023, 49(1): 31-40.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

多约束复杂环境下UAV航迹规划策略自学习方法

Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价