作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (2): 90-94,102. doi: 10.19678/j.issn.1000-3428.0057027

• 人工智能与模式识别 • 上一篇    下一篇

基于循环卷积神经网络的POMDP值迭代算法

于丹宁, 倪坤, 刘云龙   

  1. 厦门大学 航空航天学院, 福建 厦门 361102
  • 收稿日期:2019-12-25 修回日期:2020-02-04 出版日期:2021-02-15 发布日期:2020-02-12
  • 作者简介:于丹宁(1994-),女,硕士研究生,主研方向为深度强化学习、智能体决策;倪坤,硕士研究生;刘云龙(通信作者),副教授、博士。
  • 基金资助:
    国家自然科学基金(61772438,61375077)。

Value Iteration Algorithm for POMDP Based on Recurrent Convolutional Neural Network

YU Danning, NI Kun, LIU Yunlong   

  1. School of Aerospace Engineering, Xiamen University, Xiamen, Fujian 361102, China
  • Received:2019-12-25 Revised:2020-02-04 Online:2021-02-15 Published:2020-02-12

摘要: 基于卷积神经网络的部分可观测马尔科夫决策过程(POMDP)值迭代算法QMDP-net在无先验知识的情况下具有较好的性能表现,但其存在训练效果不稳定、参数敏感等优化难题。提出基于循环卷积神经网络的POMDP值迭代算法RQMDP-net,使用门控循环单元网络实现值迭代更新,在保留输入和递归权重矩阵卷积特性的同时增强网络时序处理能力。实验结果表明,RQMDP-net在10×10网格地图规划任务中导航准确率高达98.5%,且在36×36网格地图规划任务中相比QMDP-net最多提升5.8个百分点,具有更快的网络收敛速度和更强的导航任务规划能力。

关键词: 部分可观测马尔科夫决策过程, 值迭代, 卷积神经网络, 循环卷积神经网络, 智能体规划

Abstract: The value iteration algorithm,QMDP-net,for Partially Observable Markov Decision Process(POMDP) based on Convolutional Neural Network(CNN) performs well in cases of no prior knowledge.However,it often suffers from instable training results,sensitive parameter and other optimization problems.For these problems,this paper proposes a value iteration algorithm called RQMDP-net for POMDP based on Recurrent Convolutional Neural Network(RCNN).The update of value iteration is realized by using Gated Recurrent Unit(GRU),which keeps the input and convolution features of the recursive weight matrix,and enhances the sequential processing ability of the network. Experimental results show that the navigation accuracy of RQMDP-net for 10×10 planning tasks in the grid map reaches 98.5%,and is up to 5.8 percentage points higher than that of QMDP-net for 36×36 planning tasks in the grid map,which demonstrates that RQMDP-net has a higher network convergence speed and better planning ability in navigation tasks.

Key words: Partially Observable Markov Decision Process(POMDP), value iteration, Convolutional Neural Network(CNN), Recurrent Convolutional Neural Network(RCNN), agent planning

中图分类号: