作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 44-51. doi: 10.19678/j.issn.1000-3428.0057492

• 人工智能与模式识别 • 上一篇    下一篇

多约束复杂环境下UAV航迹规划策略自学习方法

邱月, 郑柏通, 蔡超   

  1. 华中科技大学 人工智能与自动化学院 多谱信息处理技术国家级重点实验室, 武汉 430074
  • 收稿日期:2020-02-25 修回日期:2020-04-28 发布日期:2020-05-08
  • 作者简介:邱月(1994-),女,硕士研究生,主研方向为任务规划、计算机视觉;郑柏通,硕士研究生;蔡超,副教授、博士。
  • 基金资助:
    江苏省自然科学基金(BK20170914)。

Self-Learning Method of UAV Track Planning Strategy in Complex Environment with Multiple Constraints

QIU Yue, ZHENG Baitong, CAI Chao   

  1. National Key Laboratory for Multi-Spectral Information Processing Technologies, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2020-02-25 Revised:2020-04-28 Published:2020-05-08

摘要: 在多约束复杂环境下,多数无人飞行器(UAV)航迹规划方法无法从历史经验中获得先验知识,导致对多变的环境适应性较差。提出一种基于深度强化学习的航迹规划策略自学习方法,利用飞行约束条件设计UAV的状态及动作模式,从搜索宽度和深度2个方面降低航迹规划搜索规模,基于航迹优化目标设计奖惩函数,利用由卷积神经网络引导的蒙特卡洛树搜索(MCTS)算法学习得到航迹规划策略。仿真结果表明,该方法自学习得到的航迹规划策略具有泛化能力,相对未迭代训练的网络,该策略仅需17%的NN-MCTS仿真次数就可引导UAV在未知飞行环境中满足约束条件并安全无碰撞地到达目的地。

关键词: 深度强化学习, 蒙特卡洛树搜索, 航迹规划策略, 策略自学习, 多约束, 复杂环境

Abstract: In a complex multi-constrained environment,the Unmanned Aerial Vehicle(UAV) track planning methods generally fail to obtain priori knowledge from historical experience,resulting in poor adaptability to a variable environment.To address the problem,this paper proposes a self-learning method for track planning strategy based on deep reinforcement learning.Based on the UAV flight constraints,the design of the UAV state and action modes is optimized to reduce the width and depth of track planning search.The reward and punishment function is designed based on the track optimization objective.Then,a Monte Carlo Tree Search(MCTS) algorithm guided by a convolutional neural network is used to learn the track planning strategy.Simulation results show that the track planning strategy obtained by the proposed self-learning method has generalization ability.Compared with the networks without iterative training,the strategy obtained by this method requires only 17% of the number of NN-MCTS simulation times to guide the UAV to reach the destination safely without collision and satisfy the constraints in an unknown environment.

Key words: deep reinforcement learning, Monte Carlo Tree Search(MCTS), track planning strategy, strategy self-learning, multiple constraints, complex environment

中图分类号: