WMat algorithm based on Q-Learning algorithm in taxi-v2 game

Fatemeh Esmaeily, M. Keyvanpour
{"title":"WMat algorithm based on Q-Learning algorithm in taxi-v2 game","authors":"Fatemeh Esmaeily, M. Keyvanpour","doi":"10.1109/SCIOT50840.2020.9250211","DOIUrl":null,"url":null,"abstract":"Reinforcement learning is a framework in which an agent aims to optimize the sum of the rewards it receives from the environment around based on a specific policy. Although many researches have been done on this type of learning, it has not been specifically addressed to improve Q-table values and to increase the amount of reward received by the agent. One of the notable issues in this regard is the improvement of the results of the q-learning algorithm functions. Hence in order to improve the amount of received reward, a weighting matrix is proposed and defined for the set of possible actions of the agent. In this regard our method is capable of reducing the likelihood of behaviours consideration by the agent which leads to a notable decrease in the time of action choosing and as a result achieves improvement in the amount of reward and produces acceptable results in this criteria.","PeriodicalId":287134,"journal":{"name":"2020 4th International Conference on Smart City, Internet of Things and Applications (SCIOT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Smart City, Internet of Things and Applications (SCIOT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCIOT50840.2020.9250211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning is a framework in which an agent aims to optimize the sum of the rewards it receives from the environment around based on a specific policy. Although many researches have been done on this type of learning, it has not been specifically addressed to improve Q-table values and to increase the amount of reward received by the agent. One of the notable issues in this regard is the improvement of the results of the q-learning algorithm functions. Hence in order to improve the amount of received reward, a weighting matrix is proposed and defined for the set of possible actions of the agent. In this regard our method is capable of reducing the likelihood of behaviours consideration by the agent which leads to a notable decrease in the time of action choosing and as a result achieves improvement in the amount of reward and produces acceptable results in this criteria.
基于Q-Learning算法的WMat算法在出租车-v2博弈中的应用
强化学习是一个框架,在这个框架中,智能体的目标是根据特定的策略优化它从周围环境中获得的奖励总和。尽管对这种类型的学习已经进行了许多研究,但还没有专门针对提高q表值和增加代理收到的奖励量。在这方面值得注意的问题之一是改进q-learning算法函数的结果。因此,为了提高接收到的奖励量,提出并定义了智能体可能行为集的加权矩阵。在这方面,我们的方法能够减少代理考虑行为的可能性,从而导致行动选择时间的显着减少,从而实现奖励数量的改善,并在该标准中产生可接受的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信