Experimental analysis of eligibility traces strategies in temporal difference learning

Jinsong Leng, L. Jain, C. Fyfe
{"title":"Experimental analysis of eligibility traces strategies in temporal difference learning","authors":"Jinsong Leng, L. Jain, C. Fyfe","doi":"10.1504/IJKESDP.2009.021982","DOIUrl":null,"url":null,"abstract":"Temporal difference (TD) learning is a model-free reinforcement learning technique, which adopts an infinite horizon discount model and uses an incremental learning technique for dynamic programming. The state value function is updated in terms of sample episodes. Utilising eligibility traces is a key mechanism in enhancing the rate of convergence. TD(λ) represents the use of eligibility traces by introducing the parameter λ. However, the underlying mechanism of eligibility traces with an approximation function has not been well understood, either from theoretical point of view or from practical point of view. The TD(λ) method has been proved to be convergent with local tabular state representation. Unfortunately, proving convergence of TD(λ) with function approximation is still an important open theoretical question. This paper aims to investigate the convergence and the effects of different eligibility traces. In this paper, we adopt Sarsa(λ) learning control algorithm with a large, stochastic and dynamic simulation environment called SoccerBots. The state value function is represented by a linear approximation function known as tile coding. The performance metrics generated from the simulation system can be used to analyse the mechanism of eligibility traces.","PeriodicalId":347123,"journal":{"name":"Int. J. Knowl. Eng. Soft Data Paradigms","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Eng. Soft Data Paradigms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJKESDP.2009.021982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Temporal difference (TD) learning is a model-free reinforcement learning technique, which adopts an infinite horizon discount model and uses an incremental learning technique for dynamic programming. The state value function is updated in terms of sample episodes. Utilising eligibility traces is a key mechanism in enhancing the rate of convergence. TD(λ) represents the use of eligibility traces by introducing the parameter λ. However, the underlying mechanism of eligibility traces with an approximation function has not been well understood, either from theoretical point of view or from practical point of view. The TD(λ) method has been proved to be convergent with local tabular state representation. Unfortunately, proving convergence of TD(λ) with function approximation is still an important open theoretical question. This paper aims to investigate the convergence and the effects of different eligibility traces. In this paper, we adopt Sarsa(λ) learning control algorithm with a large, stochastic and dynamic simulation environment called SoccerBots. The state value function is represented by a linear approximation function known as tile coding. The performance metrics generated from the simulation system can be used to analyse the mechanism of eligibility traces.
时间差异学习中资格跟踪策略的实验分析
时域差分(TD)学习是一种无模型强化学习技术,它采用无限视界折现模型,采用增量学习技术进行动态规划。状态值函数根据样本集进行更新。利用资格跟踪是提高收敛速度的关键机制。TD(λ)表示通过引入参数λ来使用合格跟踪。然而,无论是从理论的角度还是从实践的角度,合格性跟踪的基本机制都没有得到很好的理解。证明了TD(λ)方法具有局部表态表示的收敛性。遗憾的是,用函数逼近证明TD(λ)的收敛性仍然是一个重要的未决理论问题。本文旨在研究不同资格轨迹的收敛性及其影响。在本文中,我们采用Sarsa(λ)学习控制算法与一个大型,随机和动态的仿真环境称为SoccerBots。状态值函数由一个线性近似函数表示,称为tile编码。仿真系统生成的性能指标可用于分析合格跟踪机制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信