Speeding up autonomous learning by using state-independent option policies and termination improvement

Letícia Maria Friske, C. Ribeiro
{"title":"Speeding up autonomous learning by using state-independent option policies and termination improvement","authors":"Letícia Maria Friske, C. Ribeiro","doi":"10.1109/SBRN.2002.1181488","DOIUrl":null,"url":null,"abstract":"In reinforcement learning applications such as autonomous robot navigation, the use of options (macro-operators) instead of low level actions has been reported to produce learning speedup due to a more aggressive exploration of the state space. In this paper we present an evaluation of the use of option policies O/sub S/. Each option policy in this framework is a fixed sequence of actions, depending exclusively on the state in which the option is initiated. This contrasts with option policies O/sub /spl Pi//, more common in the literature and that correspond to action sequences that depend on the states visited during the execution of the options. One of our goals was to analyse the effects of a variation of the action sequence length for O/sub S/ policies. The main contribution of the paper, however, is a study on the use of a termination improvement (TI) technique which allows for the abortion of option execution if a more promising one is found. Experimental results show that TI for O/sub S/ options, whose benefits had already been reported for O/sub /spl Pi// options, can be much more effective - due to its adaptation of the size of the action sequence depending on the state where the option is initiated - than indiscriminately augmenting the option size in order to increase exploration of the state space.","PeriodicalId":157186,"journal":{"name":"VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBRN.2002.1181488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In reinforcement learning applications such as autonomous robot navigation, the use of options (macro-operators) instead of low level actions has been reported to produce learning speedup due to a more aggressive exploration of the state space. In this paper we present an evaluation of the use of option policies O/sub S/. Each option policy in this framework is a fixed sequence of actions, depending exclusively on the state in which the option is initiated. This contrasts with option policies O/sub /spl Pi//, more common in the literature and that correspond to action sequences that depend on the states visited during the execution of the options. One of our goals was to analyse the effects of a variation of the action sequence length for O/sub S/ policies. The main contribution of the paper, however, is a study on the use of a termination improvement (TI) technique which allows for the abortion of option execution if a more promising one is found. Experimental results show that TI for O/sub S/ options, whose benefits had already been reported for O/sub /spl Pi// options, can be much more effective - due to its adaptation of the size of the action sequence depending on the state where the option is initiated - than indiscriminately augmenting the option size in order to increase exploration of the state space.
通过使用状态无关的期权策略和终止改进加速自主学习
本文对期权策略O/sub / S/的使用进行了评价。此框架中的每个选项策略都是固定的操作序列,完全取决于启动选项的状态。这与期权策略O/sub /spl Pi//形成对比,后者在文献中更常见,对应于依赖于执行期权期间访问的状态的动作序列。我们的目标之一是分析操作序列长度变化对O/sub /策略的影响。然而,本文的主要贡献是研究了终止改进(TI)技术的使用,该技术允许在发现更有希望的期权执行时终止期权执行。实验结果表明,与为了增加对状态空间的探索而不加选择地增加期权大小相比,对于O/sub /spl / Pi//期权的TI,其好处已经在O/sub /spl / Pi//期权中得到了报道,可以更有效——由于它根据期权启动的状态来适应动作序列的大小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信