论文部分内容阅读
针对大规模部分可观察马尔可夫决策过程(POMDP)算法中策略树规模指数级增长、已证信念点(witness point,WP)求解困难的问题,根据策略树值函数是分段线性凸函数的特点,提出一种基于信念点的策略树增量裁剪和值迭代求解算法.在策略树生成过程中,利用边界点进行无损裁剪,利用中间点进行有损裁剪,并利用实时信念状态分布求取近似最优解.对比实验结果表明,该算法能快速收敛,以更少的时间获得相当精度的奖赏值.
Aiming at the exponential growth of strategy tree in large-scale partial observable Markov decision process (POMDP) algorithm and the difficulty of solving the problem of witness point (WP), according to the strategy that the tree function is piecewise linear convex function This paper proposes a strategy tree incremental cropping and value iterative algorithm based on belief points.Using the boundary points for lossless cropping and the use of intermediate points for lossy cropping in the strategy tree generation process, Approximate optimal solution.Contrast experimental results show that the algorithm can quickly converge, with less time to obtain a fairly accurate reward value.