Moving least-squares approximations for linearly-solvable MDP

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI:10.1109/ADPRL.2011.5967383

Mingyuan Zhong, E. Todorov

引用次数: 1

Abstract

By introducing Linearly-solvable Markov Decision Process (LMDP), a general class of nonlinear stochastic optimal control problems can be reduced to solving linear problems. However, in practice, LMDP defined on continuous state space remain difficult due to high dimensionality of the state space. Here we describe a new framework for finding this solution by using a moving least-squares approximation. We use efficient iterative solvers which do not require matrix factorization, so we could handle large numbers of bases. The basis functions are constructed based on collocation states which change over iterations of the algorithm, so as to provide higher resolution at the regions of state space that are visited more often. The shape of the bases is automatically defined given the collocation states, in a way that avoids gaps in the coverage and avoids fitting a tremendous amount of parameters. Numerical results on test problems are provided and demonstrate good behavior when scaled to problems with high dimensionality.

查看原文本刊更多论文

线性可解MDP的移动最小二乘逼近

通过引入线性可解马尔可夫决策过程，将一类一般的非线性随机最优控制问题简化为求解线性问题。然而，在实践中，由于状态空间的高维性，在连续状态空间上定义LMDP仍然很困难。在这里，我们描述了一个新的框架，通过使用移动最小二乘近似来找到这个解。我们使用高效的迭代求解器，它不需要矩阵分解，因此我们可以处理大量的基。基函数是基于随算法迭代而变化的搭配状态来构建的，从而在状态空间中访问频率较高的区域提供更高的分辨率。在给定搭配状态的情况下，碱基的形状被自动定义，从而避免了覆盖范围中的空白，避免了拟合大量参数。给出了测试问题的数值结果，并证明了该方法在高维问题中具有良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

自引率

0.00%

发文量