Learning to deal with objects

M. Malfaz, M. Salichs
{"title":"Learning to deal with objects","authors":"M. Malfaz, M. Salichs","doi":"10.1109/DEVLRN.2009.5175508","DOIUrl":null,"url":null,"abstract":"In this paper, a modification of the standard learning algorithm Q-learning is presented: Object Q-learning (OQ-learning). An autonomous agent should be able to decide its own goals and behaviours in order to fulfil these goals. When the agent has no previous knowledge, it must learn what to do in every state (policy of behaviour). If the agent uses Q-learning, this implies that it learns the utility value Q of each action-state pair. Typically, an autonomous agent living in a complex environment has to interact with different objects present in that world. In this case, the number of states of the agent in relation to those objects may increase as the number of objects increases, making the learning process difficult to deal with. The proposed modification appears as a solution in order to cope with this problem. The experimental results prove the usefulness of the OQ-learning in this situation, in comparison with the standard Q-learning algorithm.","PeriodicalId":192225,"journal":{"name":"2009 IEEE 8th International Conference on Development and Learning","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE 8th International Conference on Development and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2009.5175508","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

In this paper, a modification of the standard learning algorithm Q-learning is presented: Object Q-learning (OQ-learning). An autonomous agent should be able to decide its own goals and behaviours in order to fulfil these goals. When the agent has no previous knowledge, it must learn what to do in every state (policy of behaviour). If the agent uses Q-learning, this implies that it learns the utility value Q of each action-state pair. Typically, an autonomous agent living in a complex environment has to interact with different objects present in that world. In this case, the number of states of the agent in relation to those objects may increase as the number of objects increases, making the learning process difficult to deal with. The proposed modification appears as a solution in order to cope with this problem. The experimental results prove the usefulness of the OQ-learning in this situation, in comparison with the standard Q-learning algorithm.
学习处理物品
本文提出了对标准学习算法Q-learning的改进:对象Q-learning (Object Q-learning, OQ-learning)。为了实现这些目标,自主代理应该能够决定自己的目标和行为。当代理没有先前的知识时,它必须学习在每个状态下该做什么(行为策略)。如果智能体使用Q-learning,这意味着它学习每个动作状态对的效用值Q。通常,生活在复杂环境中的自主代理必须与该世界中存在的不同对象进行交互。在这种情况下,智能体与这些对象相关的状态数量可能会随着对象数量的增加而增加,从而使学习过程难以处理。拟议的修改似乎是解决这一问题的一种办法。与标准的q -学习算法相比,实验结果证明了oq学习在这种情况下的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信