{"title":"Speeding up autonomous learning by using state-independent option policies and termination improvement","authors":"Letícia Maria Friske, C. Ribeiro","doi":"10.1109/SBRN.2002.1181488","DOIUrl":null,"url":null,"abstract":"In reinforcement learning applications such as autonomous robot navigation, the use of options (macro-operators) instead of low level actions has been reported to produce learning speedup due to a more aggressive exploration of the state space. In this paper we present an evaluation of the use of option policies O/sub S/. Each option policy in this framework is a fixed sequence of actions, depending exclusively on the state in which the option is initiated. This contrasts with option policies O/sub /spl Pi//, more common in the literature and that correspond to action sequences that depend on the states visited during the execution of the options. One of our goals was to analyse the effects of a variation of the action sequence length for O/sub S/ policies. The main contribution of the paper, however, is a study on the use of a termination improvement (TI) technique which allows for the abortion of option execution if a more promising one is found. Experimental results show that TI for O/sub S/ options, whose benefits had already been reported for O/sub /spl Pi// options, can be much more effective - due to its adaptation of the size of the action sequence depending on the state where the option is initiated - than indiscriminately augmenting the option size in order to increase exploration of the state space.","PeriodicalId":157186,"journal":{"name":"VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VII Brazilian Symposium on Neural Networks, 2002. SBRN 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBRN.2002.1181488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In reinforcement learning applications such as autonomous robot navigation, the use of options (macro-operators) instead of low level actions has been reported to produce learning speedup due to a more aggressive exploration of the state space. In this paper we present an evaluation of the use of option policies O/sub S/. Each option policy in this framework is a fixed sequence of actions, depending exclusively on the state in which the option is initiated. This contrasts with option policies O/sub /spl Pi//, more common in the literature and that correspond to action sequences that depend on the states visited during the execution of the options. One of our goals was to analyse the effects of a variation of the action sequence length for O/sub S/ policies. The main contribution of the paper, however, is a study on the use of a termination improvement (TI) technique which allows for the abortion of option execution if a more promising one is found. Experimental results show that TI for O/sub S/ options, whose benefits had already been reported for O/sub /spl Pi// options, can be much more effective - due to its adaptation of the size of the action sequence depending on the state where the option is initiated - than indiscriminately augmenting the option size in order to increase exploration of the state space.