{"title":"WMat algorithm based on Q-Learning algorithm in taxi-v2 game","authors":"Fatemeh Esmaeily, M. Keyvanpour","doi":"10.1109/SCIOT50840.2020.9250211","DOIUrl":null,"url":null,"abstract":"Reinforcement learning is a framework in which an agent aims to optimize the sum of the rewards it receives from the environment around based on a specific policy. Although many researches have been done on this type of learning, it has not been specifically addressed to improve Q-table values and to increase the amount of reward received by the agent. One of the notable issues in this regard is the improvement of the results of the q-learning algorithm functions. Hence in order to improve the amount of received reward, a weighting matrix is proposed and defined for the set of possible actions of the agent. In this regard our method is capable of reducing the likelihood of behaviours consideration by the agent which leads to a notable decrease in the time of action choosing and as a result achieves improvement in the amount of reward and produces acceptable results in this criteria.","PeriodicalId":287134,"journal":{"name":"2020 4th International Conference on Smart City, Internet of Things and Applications (SCIOT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Smart City, Internet of Things and Applications (SCIOT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCIOT50840.2020.9250211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement learning is a framework in which an agent aims to optimize the sum of the rewards it receives from the environment around based on a specific policy. Although many researches have been done on this type of learning, it has not been specifically addressed to improve Q-table values and to increase the amount of reward received by the agent. One of the notable issues in this regard is the improvement of the results of the q-learning algorithm functions. Hence in order to improve the amount of received reward, a weighting matrix is proposed and defined for the set of possible actions of the agent. In this regard our method is capable of reducing the likelihood of behaviours consideration by the agent which leads to a notable decrease in the time of action choosing and as a result achieves improvement in the amount of reward and produces acceptable results in this criteria.