Temporal difference learning with Interpolated N-Tuple networks: initial results on pole balancing

Aisha A. Abdullahi, S. Lucas
{"title":"Temporal difference learning with Interpolated N-Tuple networks: initial results on pole balancing","authors":"Aisha A. Abdullahi, S. Lucas","doi":"10.1109/UKCI.2010.5625609","DOIUrl":null,"url":null,"abstract":"Temporal difference learning (TDL) is perhaps the most widely used reinforcement learning method and gives competitive results on a range of problems, especially when using linear or table-based function approximators. However, it has been shown to give poor results on some continuous control problems and an important question is how it can be applied to such problems more effectively. The crucial point is how TDL can be generalized and scaled to deal with complex, high-dimensional problems without suffering from the curse of dimensionality. We introduce a new function approximation architecture called the Interpolated N-Tuple network and perform a proof-of-concept test on a classic reinforcement learning problem of pole balancing. The results show the method to be highly effective on this problem. They offer an important counter-example to some recently reported results that showed neuro-evolution outperforming TDL. The TDL with Interpolated N-Tuple networks learns to balance the pole considerably faster than the leading neuro-evolution techniques.","PeriodicalId":403291,"journal":{"name":"2010 UK Workshop on Computational Intelligence (UKCI)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 UK Workshop on Computational Intelligence (UKCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UKCI.2010.5625609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Temporal difference learning (TDL) is perhaps the most widely used reinforcement learning method and gives competitive results on a range of problems, especially when using linear or table-based function approximators. However, it has been shown to give poor results on some continuous control problems and an important question is how it can be applied to such problems more effectively. The crucial point is how TDL can be generalized and scaled to deal with complex, high-dimensional problems without suffering from the curse of dimensionality. We introduce a new function approximation architecture called the Interpolated N-Tuple network and perform a proof-of-concept test on a classic reinforcement learning problem of pole balancing. The results show the method to be highly effective on this problem. They offer an important counter-example to some recently reported results that showed neuro-evolution outperforming TDL. The TDL with Interpolated N-Tuple networks learns to balance the pole considerably faster than the leading neuro-evolution techniques.
内插n元组网络的时间差分学习:极点平衡的初步结果
时间差分学习(TDL)可能是最广泛使用的强化学习方法,并在一系列问题上给出了有竞争力的结果,特别是在使用线性或基于表的函数近似器时。然而,它已被证明在一些连续控制问题上的结果很差,一个重要的问题是如何更有效地应用于这些问题。关键的一点是如何将TDL一般化和扩展,以处理复杂的高维问题,而不受维度的困扰。我们引入了一种新的函数近似架构,称为插值n元组网络,并对极点平衡的经典强化学习问题进行了概念验证测试。结果表明,该方法对该问题是非常有效的。他们为最近报道的一些结果提供了一个重要的反例,这些结果表明神经进化优于TDL。具有插值n元组网络的TDL学习平衡极点的速度比领先的神经进化技术要快得多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信