{"title":"Applying Reward Design Based on Payment Mechanism to Shaped-Reward DQN for Beer Game","authors":"Masaaki Hori, T. Matsui","doi":"10.1109/IIAIAAI55812.2022.00083","DOIUrl":null,"url":null,"abstract":"We focus on the application of multiagent reinforcement learning for supply chain management. The beer game is an example of a problem in supply chain management and has been studied as a cooperation problem in multiagent systems. In the previous study, a method SRDQN that is based on deep reinforcement learning and reward shaping has been applied as a solution to the beer game. In the previous study of SRDQN, a single agent in a game performs reinforcement learning considering other agents to reduce the global cost for inventories of beers. However, it is possible to employ other reward shaping techniques to improve learning stability. It can also be effective in the systems consisting of multiple agents that perform reinforcement learning. We apply a reward shaping technique based on mechanism design to SRDQN to improve the cooperative policies, and then we empirically evaluate the effectiveness of the proposed approach.","PeriodicalId":156230,"journal":{"name":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAIAAI55812.2022.00083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We focus on the application of multiagent reinforcement learning for supply chain management. The beer game is an example of a problem in supply chain management and has been studied as a cooperation problem in multiagent systems. In the previous study, a method SRDQN that is based on deep reinforcement learning and reward shaping has been applied as a solution to the beer game. In the previous study of SRDQN, a single agent in a game performs reinforcement learning considering other agents to reduce the global cost for inventories of beers. However, it is possible to employ other reward shaping techniques to improve learning stability. It can also be effective in the systems consisting of multiple agents that perform reinforcement learning. We apply a reward shaping technique based on mechanism design to SRDQN to improve the cooperative policies, and then we empirically evaluate the effectiveness of the proposed approach.