{"title":"Fast and feasible reinforcement learning algorithm","authors":"S. Ono, Y. Inagaki, H. Aisu, H. Sugie, T. Unemi","doi":"10.1109/FUZZY.1995.409907","DOIUrl":null,"url":null,"abstract":"It is desirable that agents can determine themselves the next action to execute, using fast and feasible learning algorithm to adapt itself the dynamic environments. The reinforcement learning method is suitable for learning in an a priori known environment. We have improved IBRL1 (Instance-Based Reinforcement Learning 1), which is based on the instance-based learning approach, to increase the convergence and feasibility of learning in a grid world. It is supposed that the learning agents do not themselves know the correct position in the grid world, but that they receive inputs from their sensors. Thus, agents are faced with what is known as the hidden state problem. The payment of immediate cost in a bucket brigade algorithm, the distribution of delayed reward by profit sharing, and the use of a time series achieves fast and feasible convergence in environments that include the hidden state problem. The capability of this algorithm is demonstrated in the grid world. By using this algorithm, our robot in the simulation is able to learn the path to the goal. Experiment demonstrates a learning effect through decline in the spent steps during repetitions of goal search.<<ETX>>","PeriodicalId":150477,"journal":{"name":"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FUZZY.1995.409907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
It is desirable that agents can determine themselves the next action to execute, using fast and feasible learning algorithm to adapt itself the dynamic environments. The reinforcement learning method is suitable for learning in an a priori known environment. We have improved IBRL1 (Instance-Based Reinforcement Learning 1), which is based on the instance-based learning approach, to increase the convergence and feasibility of learning in a grid world. It is supposed that the learning agents do not themselves know the correct position in the grid world, but that they receive inputs from their sensors. Thus, agents are faced with what is known as the hidden state problem. The payment of immediate cost in a bucket brigade algorithm, the distribution of delayed reward by profit sharing, and the use of a time series achieves fast and feasible convergence in environments that include the hidden state problem. The capability of this algorithm is demonstrated in the grid world. By using this algorithm, our robot in the simulation is able to learn the path to the goal. Experiment demonstrates a learning effect through decline in the spent steps during repetitions of goal search.<>