Speeding up autonomous learning by using state-independent option policies and termination improvement

VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings. Pub Date : 2002-11-11 DOI:10.1109/SBRN.2002.1181488

Letícia Maria Friske, C. Ribeiro

{"title":"Speeding up autonomous learning by using state-independent option policies and termination improvement","authors":"Letícia Maria Friske, C. Ribeiro","doi":"10.1109/SBRN.2002.1181488","DOIUrl":null,"url":null,"abstract":"In reinforcement learning applications such as autonomous robot navigation, the use of options (macro-operators) instead of low level actions has been reported to produce learning speedup due to a more aggressive exploration of the state space. In this paper we present an evaluation of the use of option policies O/sub S/. Each option policy in this framework is a fixed sequence of actions, depending exclusively on the state in which the option is initiated. This contrasts with option policies O/sub /spl Pi//, more common in the literature and that correspond to action sequences that depend on the states visited during the execution of the options. One of our goals was to analyse the effects of a variation of the action sequence length for O/sub S/ policies. The main contribution of the paper, however, is a study on the use of a termination improvement (TI) technique which allows for the abortion of option execution if a more promising one is found. Experimental results show that TI for O/sub S/ options, whose benefits had already been reported for O/sub /spl Pi// options, can be much more effective - due to its adaptation of the size of the action sequence depending on the state where the option is initiated - than indiscriminately augmenting the option size in order to increase exploration of the state space.","PeriodicalId":157186,"journal":{"name":"VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBRN.2002.1181488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In reinforcement learning applications such as autonomous robot navigation, the use of options (macro-operators) instead of low level actions has been reported to produce learning speedup due to a more aggressive exploration of the state space. In this paper we present an evaluation of the use of option policies O/sub S/. Each option policy in this framework is a fixed sequence of actions, depending exclusively on the state in which the option is initiated. This contrasts with option policies O/sub /spl Pi//, more common in the literature and that correspond to action sequences that depend on the states visited during the execution of the options. One of our goals was to analyse the effects of a variation of the action sequence length for O/sub S/ policies. The main contribution of the paper, however, is a study on the use of a termination improvement (TI) technique which allows for the abortion of option execution if a more promising one is found. Experimental results show that TI for O/sub S/ options, whose benefits had already been reported for O/sub /spl Pi// options, can be much more effective - due to its adaptation of the size of the action sequence depending on the state where the option is initiated - than indiscriminately augmenting the option size in order to increase exploration of the state space.

查看原文本刊更多论文

通过使用状态无关的期权策略和终止改进加速自主学习

本文对期权策略O/sub / S/的使用进行了评价。此框架中的每个选项策略都是固定的操作序列，完全取决于启动选项的状态。这与期权策略O/sub /spl Pi//形成对比，后者在文献中更常见，对应于依赖于执行期权期间访问的状态的动作序列。我们的目标之一是分析操作序列长度变化对O/sub /策略的影响。然而，本文的主要贡献是研究了终止改进(TI)技术的使用，该技术允许在发现更有希望的期权执行时终止期权执行。实验结果表明，与为了增加对状态空间的探索而不加选择地增加期权大小相比，对于O/sub /spl / Pi//期权的TI，其好处已经在O/sub /spl / Pi//期权中得到了报道，可以更有效——由于它根据期权启动的状态来适应动作序列的大小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.

自引率

0.00%

发文量