Automatic basis function construction for approximate dynamic programming and reinforcement learning

Proceedings of the 23rd international conference on Machine learning Pub Date : 2006-06-25 DOI:10.1145/1143844.1143901

Philipp W. Keller, Shie Mannor, Doina Precup

引用次数: 182

Abstract

We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.

查看原文本刊更多论文

近似动态规划和强化学习的自动基函数构造

研究了马尔可夫决策过程(MDP)值函数线性逼近的基函数自动构造问题。我们的工作建立在Bertsekas和Castañon(1989)的结果之上，他们提出了一种自动聚合状态以加速值迭代的方法。我们建议使用邻域成分分析(Goldberger et al.， 2005)，这是一种为监督学习创建的降维技术，以便基于Bellman误差或时间差(TD)误差将高维状态空间映射到低维空间。然后把基函数放在低维空间中。这些是作为线性函数逼近器的新特性添加的。该方法应用于一个高维库存控制问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 23rd international conference on Machine learning

自引率

0.00%

发文量