WMat algorithm based on Q-Learning algorithm in taxi-v2 game

2020 4th International Conference on Smart City, Internet of Things and Applications (SCIOT) Pub Date : 2020-09-16 DOI:10.1109/SCIOT50840.2020.9250211

Fatemeh Esmaeily, M. Keyvanpour

引用次数: 0

Abstract

Reinforcement learning is a framework in which an agent aims to optimize the sum of the rewards it receives from the environment around based on a specific policy. Although many researches have been done on this type of learning, it has not been specifically addressed to improve Q-table values and to increase the amount of reward received by the agent. One of the notable issues in this regard is the improvement of the results of the q-learning algorithm functions. Hence in order to improve the amount of received reward, a weighting matrix is proposed and defined for the set of possible actions of the agent. In this regard our method is capable of reducing the likelihood of behaviours consideration by the agent which leads to a notable decrease in the time of action choosing and as a result achieves improvement in the amount of reward and produces acceptable results in this criteria.

查看原文本刊更多论文

基于Q-Learning算法的WMat算法在出租车-v2博弈中的应用

强化学习是一个框架，在这个框架中，智能体的目标是根据特定的策略优化它从周围环境中获得的奖励总和。尽管对这种类型的学习已经进行了许多研究，但还没有专门针对提高q表值和增加代理收到的奖励量。在这方面值得注意的问题之一是改进q-learning算法函数的结果。因此，为了提高接收到的奖励量，提出并定义了智能体可能行为集的加权矩阵。在这方面，我们的方法能够减少代理考虑行为的可能性，从而导致行动选择时间的显着减少，从而实现奖励数量的改善，并在该标准中产生可接受的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 4th International Conference on Smart City, Internet of Things and Applications (SCIOT)

自引率

0.00%

发文量