基于后验信念聚类的在线规划算法

doi:10.3969/j.issn.1000-3428.2013.04.049

计算机工程 ›› 2013, Vol. 39 ›› Issue (4): 214-218. doi: 10.3969/j.issn.1000-3428.2013.04.049

基于后验信念聚类的在线规划算法

仵博^1,2,3，吴敏^2,3

(1. 深圳职业技术学院教育技术与信息中心，广东深圳 518055；2. 中南大学信息科学与工程学院，长沙 410083； 3. 先进控制与智能自动化湖南省工程实验室，长沙 410083)

收稿日期:2012-05-08 出版日期:2013-04-15 发布日期:2013-04-12
作者简介:仵博(1979－)，男，副教授、博士研究生，主研方向：机器学习，无线传感器网络；吴敏，教授、博士、博士生导师
基金资助:
国家自然科学基金资助项目(61074058)；广东省自然科学基金资助项目(S2011040004769)

Online Planning Algorithm Based on Posterior Belief Clustering

WU Bo ^1,2,3, WU Min ^2,3

(1. Education Technology and Information Center, Shenzhen Polytechnic, Shenzhen 518055, China; 2. School of Information Science and Engineering, Central South University, Changsha 410083, China; 3. Hunan Engineering Laboratory for Advanced Control and Intelligent Automation, Changsha 410083, China)

Received:2012-05-08 Online:2013-04-15 Published:2013-04-12

摘要/Abstract

摘要： 在连续状态的部分可观察马尔可夫决策过程中，在线规划无法同时满足高实时性与低误差的要求。为此，提出一种基于后验信念聚类的在线规划算法。使用KL散度分析连续状态下后验信念之间的误差，根据误差分析结果对后验信念进行聚类，利用聚类后验信念计算报酬值，并采用分支界限裁剪方法裁剪后验信念与或树。实验结果表明，该算法能够有效降低求解问题的规模，消除重复计算，具有较好的实时性和较低的误差。

关键词: 部分可观察马尔可夫决策过程, 后验信念聚类, 在线规划, KL散度, 分支界限

Abstract: Aiming at the problem that online planning can not meet the requirement of high real-time and low error at the same time in continuous states Partially Observable Markov Decision Processes(POMDPs), a forward-search algorithm called the Posterior Belief Clustering(PBC) is proposed in this paper. PBC analyzes the errors of a class of continuous posterior beliefs by KL divergence, and clusters these posterior beliefs into one based on errors of KL divergence. PBC calculates the posterior reward value only once for each cluster. The algorithm exploits branch-and-bound pruning approach to prune the posterior beliefs and/or tree online. Experimental results show that this algorithm can effectively reduce the size of the solving problem, eliminates repeated computation, and has good performance on real-time and low errors.

Key words: Partially Observable Markov Decision Processes(POMDPs), Posterior Belief Clustering(PBC), online planning, KL divergence, branch-and-bound

中图分类号:

TP301.6

仵博, 吴敏. 基于后验信念聚类的在线规划算法[J]. 计算机工程, 2013, 39(4): 214-218.

WU Bo, TUN Min. Online Planning Algorithm Based on Posterior Belief Clustering[J]. Computer Engineering, 2013, 39(4): 214-218.

http://www.ecice06.com/CN/Y2013/V39/I4/214

参考文献

[1] 孙湧, 仵博, 冯延蓬. 基于策略迭代和值迭代的POMDP算法[J]. 计算机研究与发展, 2008, 45(10): 1763-1768.
[2] Kurniawati H, Hsu D, Lee W S. SARSOP: Efficient Point- based POMDP Planning by Approximating Optimally Reachable Belief Spaces[C]//Proc. of Robotics: Science and Systems. Zurich, Switzerland: MIT Press, 2008.
[3] Ross S, Pineau J, Paquet S, et al. Online Planning Algorithms for POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32(1): 663-704.
[4] 仵博, 吴敏, 佘锦华. 基于点的POMDPs在线值迭代算法[J]. 软件学报, 2013, 24(1): 25-36.
[5] He Ruijie, Brunskill E, Roy N. Efficient Planning Under Uncertainty with Macro-actions[J]. Journal of Artificial Intelligence Research, 2011, 40(1): 523-570.
[6] 仵博, 吴敏. 部分可观察马尔可夫决策过程研究进展[J]. 计算机工程与设计, 2007, 28(9): 2116-2119.
[7] Boyen X, Koller D. Tractable Inference for Complex Stochastic Processes[C]//Proc. of the 14th Conference on Uncertainty in Artificial Intelligence. Madison, USA: Morgan Kaufmann Press, 1998.
[8] Cohn R, Durfee E, Singh S. Planning Delayed-response Queries and Transient Policies Under Reward Uncertainty[C]//Proc. of the 7th Annual Workshop on Multiagent Sequential Decision-making Under Uncertainty. Valencia, Spain: ACM Press, 2012.
[9] 周红芳, 李红岩, 刘颖, 等. 多维数据集中聚类数确定算法研究[J]. 计算机工程, 2012, 38(9): 8-11.
[10] Andrieu C, Doucet A, Holenstein R. Particle Markov Chain Monte Carlo Methods[J]. Journal of the Royal Statistical Society: Series B, 2010, 72(3): 269-342.
[11] Kwok C, Fox D, Meila M. Real-time Particle Filters[J]. Proceedings of the IEEE, 2004, 92(3): 469-484.
[12] Smith T, Simmons R. Point-based POMDP Algorithms: Improved Analysis and Implementation[C]//Proc. of the 21st Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2005.
编辑刘冰

选择文件类型/文献管理软件名称

选择包含的内容

基于后验信念聚类的在线规划算法

Online Planning Algorithm Based on Posterior Belief Clustering

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价

[1]	冯延蓬, 仵博, 郑红燕, 孟宪军. WSN中一种目标追踪在线节点调度算法[J]. 计算机工程, 2012, 38(11): 96-99,103.
[2]	奚玲, 平西建, 张昊. 基于GMM模型的自适应扩频隐写安全性分析[J]. 计算机工程, 2012, 38(01): 137-139.
[3]	何会民. 基于随机游走模型和KL-divergence的聚类算法[J]. 计算机工程, 2008, 34(16): 224-226.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于后验信念聚类的在线规划算法

Online Planning Algorithm Based on Posterior Belief Clustering

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

编辑推荐

Metrics

本文评价