基于点的FO-POMDP值迭代方法研究

doi:10.3969/j.issn.1000-3428.2013.10.046

计算机工程

基于点的FO-POMDP值迭代方法研究

陈丽娜，黄宏斌，邓苏

(国防科学技术大学信息系统工程重点实验室，长沙 410073)

收稿日期:2012-05-14 出版日期:2013-10-15 发布日期:2013-10-14
作者简介:陈丽娜(1983－)，女，博士研究生，主研方向：智能决策；黄宏斌，副教授、博士；邓苏，教授、博士生导师
基金资助:
国家自然科学基金资助项目(71071160)

Research on Point-based Value Iteration Method for FO-POMDP

CHEN Li-na, HUANG Hong-bin, DENG Su

(Key Laboratory of Information System Engineering, National University of Defense Technology, Changsha 410073, China)

Received:2012-05-14 Online:2013-10-15 Published:2013-10-14

摘要/Abstract

摘要： 在部分可观测马尔可夫决策过程(POMDP)的基础上，给出一阶部分可观测马尔科夫决策过程(FO-POMDP)，用一阶逻辑的情景演算结构表达POMDP。对FO-POMDP模型中状态的抽象层次进行刻画，提出状态粒度、信念状态粒度的概念。采用粒度归结方法，将信念状态的粒度归结到某一确定粒度下，运用确定粒度下的信念点距离度量方法，将基于点的价值迭代(PBVI)扩展到逻辑抽象层面提出一阶PBVI(FO-PBVI)。实验结果证明，该算法的求解速度较快，求解质量较好。

关键词: 部分可观测马尔科夫决策过程, 状态空间, 信念状态, 粒度归结, 基于点的值迭代

Abstract: This paper presents the First Order-Partially Observable Markov Decision Processes(FO-POMDP), which is a logical expression of POMDP using situation calculus. And the level of abstraction is an important problem for solving the FO-POMDP. The concept of the granularity of states and the granularity of belief states are proposed. The level of abstraction can be characterized by the granularity. The method of granularity resolution can convert the granularity of belief states. And the distance of different belief states is also presented. The Point-based Value Iteration(PBVI) is extended to the logic level. Experimental results show that the solving speed of this algorithm is faster, and is of better quality.

Key words: Partially Observable Markov Decision Processes(POMDP), state space, belief state, granularity resolution, Point-based Value Iteration(PBVI)

中图分类号:

TP18

陈丽娜，黄宏斌，邓苏. 基于点的FO-POMDP值迭代方法研究[J]. 计算机工程, doi: 10.3969/j.issn.1000-3428.2013.10.046.

CHEN Li-na, HUANG Hong-bin, DENG Su. Research on Point-based Value Iteration Method for FO-POMDP[J]. Computer Engineering, doi: 10.3969/j.issn.1000-3428.2013.10.046.

http://www.ecice06.com/CN/Y2013/V39/I10/217

参考文献

参考文献 [1] Cassandra A R. A Survey of POMDP Applications[C]//Proc. of Symposium on Planning with Partially Observable Markov Decision Processes. [S. l.]: AAAI Press, 1998. [2] Zhang Shiqi, Sridharan M. Vision-based Scene Analysis on Mobile Robots Using Layered POMDPs[C]//Proc. of International Conference on Automated Planning and Scheduling. Toronto, Canada: [s. n.], 2010. [3] Ong S C W, Png S W, Hsu D, et al. Planning Under Uncertainty for Robotic Tasks with Mixed Observability[J]. International Journal of Robotics Research, 2010, 29(8): 1053-1068. [4] Wang Chenggang, Schmolze J. Planning with POMDPs Using a Compact, Logic-based Representation[C]//Proc. of the 17th International Conference on Tools with Artificial Intelligence. [S. l.]: IEEE Computer Society, 2005: 523-530. [5] Wang Chenggang, Brodley C, Mahadevan S, et al. First Order Markov Decision Processes[D]. Medford, USA: Tufts University, 2007. [6] Wang Chenggang, Khardon R. Relational Partially Observable MDPs[C]//Proc. of the 24th AAAI Conference on Artificial Intelligence. Atlanta, Georgia: AAAI Press, 2010: 1153-1157. [7] Sanner S, Kersting K. Symbolic Dynamic Programming for First-order POMDPs[C]//Proc. of the 24th AAAI Conference on Artificial Intelligence. Atlanta, Georgia: AAAI Press, 2010. [8] 卞爱华, 王崇骏, 陈世福. 基于点的POMDP算法的预处理方法[J]. 软件学报, 2008, 19(6): 1309-1316. [9] 冯奇, 周雪忠, 黄厚宽, 等. POMDP基于点的值迭代算法中一种信念选择方法[J]. 北京交通大学学报, 2009, 33(5): 77-80. [10] Pineau J, Gordon G, Thrun S. Point-based Value Iteration: An Anytime Algorithm for POMDPs[C]//Proc. of the 18th International Joint Conference on Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers Inc., 2003. 编辑顾逸斐

[1]	于丹宁, 倪坤, 刘云龙. 基于循环卷积神经网络的POMDP值迭代算法[J]. 计算机工程, 2021, 47(2): 90-94,102.
[2]	孙振华,南新元,蔡鑫. 基于动态数据驱动的生物氧化槽进气量预测[J]. 计算机工程, 2018, 44(6): 279-282,287.
[3]	彭德军,王燕军,李宽,游路瑶. 基于模拟退火的WSAN多设定值调度算法[J]. 计算机工程, 2017, 43(12): 130-135,146.
[4]	李振, 孙新利, 姬国勋, 刘好杰, 刘志勇. 基于无效状态空间的多状态网络可靠性评估[J]. 计算机工程, 2012, 38(23): 95-100.
[5]	徐少平, 刘小平, 李春泉, 罗洁, 江顺亮. 基于状态空间预计算的软组织实时形变模型[J]. 计算机工程, 2012, 38(22): 251-259.
[6]	赵福奎, 卢雷. 基于CPN的RUDP建模与分析[J]. 计算机工程, 2012, 38(20): 290-292.
[7]	刘文婷, 高建华. 一种提高状态空间搜索效率的执行方法[J]. 计算机工程, 2011, 37(7): 41-43.
[8]	游静, 石蕊, 孙玉强, 徐建. 基于资源耗费规律的计算系统性能评价[J]. 计算机工程, 2010, 36(13): 68-69,72.
[9]	高晓雷;缪淮扣. 空间完备性理论及其测试[J]. 计算机工程, 2009, 35(9): 72-73,7.
[10]	焦殿科;石川. 共享经验的多主体强化学习研究[J]. 计算机工程, 2008, 34(11): 219-221.
[11]	胡志刚;陈华全;谌任. 基于有色Petri网的网格调度模型的研究[J]. 计算机工程, 2006, 32(18): 67-69.

选择文件类型/文献管理软件名称

选择包含的内容

基于点的FO-POMDP值迭代方法研究

Research on Point-based Value Iteration Method for FO-POMDP

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于点的FO-POMDP值迭代方法研究

Research on Point-based Value Iteration Method for FO-POMDP

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics

本文评价