快速可行的强化学习算法

Proceedings of 1995 IEEE International Conference on Fuzzy Systems. Pub Date : 1995-03-20 DOI:10.1109/FUZZY.1995.409907

S. Ono, Y. Inagaki, H. Aisu, H. Sugie, T. Unemi

{"title":"快速可行的强化学习算法","authors":"S. Ono, Y. Inagaki, H. Aisu, H. Sugie, T. Unemi","doi":"10.1109/FUZZY.1995.409907","DOIUrl":null,"url":null,"abstract":"It is desirable that agents can determine themselves the next action to execute, using fast and feasible learning algorithm to adapt itself the dynamic environments. The reinforcement learning method is suitable for learning in an a priori known environment. We have improved IBRL1 (Instance-Based Reinforcement Learning 1), which is based on the instance-based learning approach, to increase the convergence and feasibility of learning in a grid world. It is supposed that the learning agents do not themselves know the correct position in the grid world, but that they receive inputs from their sensors. Thus, agents are faced with what is known as the hidden state problem. The payment of immediate cost in a bucket brigade algorithm, the distribution of delayed reward by profit sharing, and the use of a time series achieves fast and feasible convergence in environments that include the hidden state problem. The capability of this algorithm is demonstrated in the grid world. By using this algorithm, our robot in the simulation is able to learn the path to the goal. Experiment demonstrates a learning effect through decline in the spent steps during repetitions of goal search.<<ETX>>","PeriodicalId":150477,"journal":{"name":"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Fast and feasible reinforcement learning algorithm\",\"authors\":\"S. Ono, Y. Inagaki, H. Aisu, H. Sugie, T. Unemi\",\"doi\":\"10.1109/FUZZY.1995.409907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is desirable that agents can determine themselves the next action to execute, using fast and feasible learning algorithm to adapt itself the dynamic environments. The reinforcement learning method is suitable for learning in an a priori known environment. We have improved IBRL1 (Instance-Based Reinforcement Learning 1), which is based on the instance-based learning approach, to increase the convergence and feasibility of learning in a grid world. It is supposed that the learning agents do not themselves know the correct position in the grid world, but that they receive inputs from their sensors. Thus, agents are faced with what is known as the hidden state problem. The payment of immediate cost in a bucket brigade algorithm, the distribution of delayed reward by profit sharing, and the use of a time series achieves fast and feasible convergence in environments that include the hidden state problem. The capability of this algorithm is demonstrated in the grid world. By using this algorithm, our robot in the simulation is able to learn the path to the goal. Experiment demonstrates a learning effect through decline in the spent steps during repetitions of goal search.<<ETX>>\",\"PeriodicalId\":150477,\"journal\":{\"name\":\"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FUZZY.1995.409907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FUZZY.1995.409907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

期望智能体能够自行决定下一步要执行的动作，并使用快速可行的学习算法来适应动态环境。强化学习方法适用于在先验已知环境中学习。我们改进了IBRL1(基于实例的强化学习1)，它基于基于实例的学习方法，以提高网格世界中学习的收敛性和可行性。假设学习代理本身不知道网格世界中的正确位置，但它们从传感器接收输入。因此，智能体面临着所谓的隐藏状态问题。在包含隐藏状态问题的环境中，桶队算法的即时成本支付、利润分享的延迟奖励分配以及时间序列的使用实现了快速可行的收敛。在网格世界中验证了该算法的性能。通过使用该算法，我们的机器人在仿真中能够学习到目标的路径。实验表明，在重复目标搜索的过程中，学习者花费的步数会减少，从而产生学习效果

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast and feasible reinforcement learning algorithm

It is desirable that agents can determine themselves the next action to execute, using fast and feasible learning algorithm to adapt itself the dynamic environments. The reinforcement learning method is suitable for learning in an a priori known environment. We have improved IBRL1 (Instance-Based Reinforcement Learning 1), which is based on the instance-based learning approach, to increase the convergence and feasibility of learning in a grid world. It is supposed that the learning agents do not themselves know the correct position in the grid world, but that they receive inputs from their sensors. Thus, agents are faced with what is known as the hidden state problem. The payment of immediate cost in a bucket brigade algorithm, the distribution of delayed reward by profit sharing, and the use of a time series achieves fast and feasible convergence in environments that include the hidden state problem. The capability of this algorithm is demonstrated in the grid world. By using this algorithm, our robot in the simulation is able to learn the path to the goal. Experiment demonstrates a learning effect through decline in the spent steps during repetitions of goal search.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 1995 IEEE International Conference on Fuzzy Systems.

自引率

0.00%

发文量