快速可行的强化学习算法

S. Ono, Y. Inagaki, H. Aisu, H. Sugie, T. Unemi
{"title":"快速可行的强化学习算法","authors":"S. Ono, Y. Inagaki, H. Aisu, H. Sugie, T. Unemi","doi":"10.1109/FUZZY.1995.409907","DOIUrl":null,"url":null,"abstract":"It is desirable that agents can determine themselves the next action to execute, using fast and feasible learning algorithm to adapt itself the dynamic environments. The reinforcement learning method is suitable for learning in an a priori known environment. We have improved IBRL1 (Instance-Based Reinforcement Learning 1), which is based on the instance-based learning approach, to increase the convergence and feasibility of learning in a grid world. It is supposed that the learning agents do not themselves know the correct position in the grid world, but that they receive inputs from their sensors. Thus, agents are faced with what is known as the hidden state problem. The payment of immediate cost in a bucket brigade algorithm, the distribution of delayed reward by profit sharing, and the use of a time series achieves fast and feasible convergence in environments that include the hidden state problem. The capability of this algorithm is demonstrated in the grid world. By using this algorithm, our robot in the simulation is able to learn the path to the goal. Experiment demonstrates a learning effect through decline in the spent steps during repetitions of goal search.<<ETX>>","PeriodicalId":150477,"journal":{"name":"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Fast and feasible reinforcement learning algorithm\",\"authors\":\"S. Ono, Y. Inagaki, H. Aisu, H. Sugie, T. Unemi\",\"doi\":\"10.1109/FUZZY.1995.409907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is desirable that agents can determine themselves the next action to execute, using fast and feasible learning algorithm to adapt itself the dynamic environments. The reinforcement learning method is suitable for learning in an a priori known environment. We have improved IBRL1 (Instance-Based Reinforcement Learning 1), which is based on the instance-based learning approach, to increase the convergence and feasibility of learning in a grid world. It is supposed that the learning agents do not themselves know the correct position in the grid world, but that they receive inputs from their sensors. Thus, agents are faced with what is known as the hidden state problem. The payment of immediate cost in a bucket brigade algorithm, the distribution of delayed reward by profit sharing, and the use of a time series achieves fast and feasible convergence in environments that include the hidden state problem. The capability of this algorithm is demonstrated in the grid world. By using this algorithm, our robot in the simulation is able to learn the path to the goal. Experiment demonstrates a learning effect through decline in the spent steps during repetitions of goal search.<<ETX>>\",\"PeriodicalId\":150477,\"journal\":{\"name\":\"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FUZZY.1995.409907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1995 IEEE International Conference on Fuzzy Systems.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FUZZY.1995.409907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

期望智能体能够自行决定下一步要执行的动作,并使用快速可行的学习算法来适应动态环境。强化学习方法适用于在先验已知环境中学习。我们改进了IBRL1(基于实例的强化学习1),它基于基于实例的学习方法,以提高网格世界中学习的收敛性和可行性。假设学习代理本身不知道网格世界中的正确位置,但它们从传感器接收输入。因此,智能体面临着所谓的隐藏状态问题。在包含隐藏状态问题的环境中,桶队算法的即时成本支付、利润分享的延迟奖励分配以及时间序列的使用实现了快速可行的收敛。在网格世界中验证了该算法的性能。通过使用该算法,我们的机器人在仿真中能够学习到目标的路径。实验表明,在重复目标搜索的过程中,学习者花费的步数会减少,从而产生学习效果
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fast and feasible reinforcement learning algorithm
It is desirable that agents can determine themselves the next action to execute, using fast and feasible learning algorithm to adapt itself the dynamic environments. The reinforcement learning method is suitable for learning in an a priori known environment. We have improved IBRL1 (Instance-Based Reinforcement Learning 1), which is based on the instance-based learning approach, to increase the convergence and feasibility of learning in a grid world. It is supposed that the learning agents do not themselves know the correct position in the grid world, but that they receive inputs from their sensors. Thus, agents are faced with what is known as the hidden state problem. The payment of immediate cost in a bucket brigade algorithm, the distribution of delayed reward by profit sharing, and the use of a time series achieves fast and feasible convergence in environments that include the hidden state problem. The capability of this algorithm is demonstrated in the grid world. By using this algorithm, our robot in the simulation is able to learn the path to the goal. Experiment demonstrates a learning effect through decline in the spent steps during repetitions of goal search.<>
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信