Temporal difference learning with Interpolated N-Tuple networks: initial results on pole balancing

2010 UK Workshop on Computational Intelligence (UKCI) Pub Date : 2010-11-09 DOI:10.1109/UKCI.2010.5625609

Aisha A. Abdullahi, S. Lucas

引用次数: 2

Abstract

Temporal difference learning (TDL) is perhaps the most widely used reinforcement learning method and gives competitive results on a range of problems, especially when using linear or table-based function approximators. However, it has been shown to give poor results on some continuous control problems and an important question is how it can be applied to such problems more effectively. The crucial point is how TDL can be generalized and scaled to deal with complex, high-dimensional problems without suffering from the curse of dimensionality. We introduce a new function approximation architecture called the Interpolated N-Tuple network and perform a proof-of-concept test on a classic reinforcement learning problem of pole balancing. The results show the method to be highly effective on this problem. They offer an important counter-example to some recently reported results that showed neuro-evolution outperforming TDL. The TDL with Interpolated N-Tuple networks learns to balance the pole considerably faster than the leading neuro-evolution techniques.

查看原文本刊更多论文

内插n元组网络的时间差分学习:极点平衡的初步结果

时间差分学习(TDL)可能是最广泛使用的强化学习方法，并在一系列问题上给出了有竞争力的结果，特别是在使用线性或基于表的函数近似器时。然而，它已被证明在一些连续控制问题上的结果很差，一个重要的问题是如何更有效地应用于这些问题。关键的一点是如何将TDL一般化和扩展，以处理复杂的高维问题，而不受维度的困扰。我们引入了一种新的函数近似架构，称为插值n元组网络，并对极点平衡的经典强化学习问题进行了概念验证测试。结果表明，该方法对该问题是非常有效的。他们为最近报道的一些结果提供了一个重要的反例，这些结果表明神经进化优于TDL。具有插值n元组网络的TDL学习平衡极点的速度比领先的神经进化技术要快得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 UK Workshop on Computational Intelligence (UKCI)

自引率

0.00%

发文量