{"title":"基于支付机制的奖励设计在啤酒游戏形状奖励DQN中的应用","authors":"Masaaki Hori, T. Matsui","doi":"10.1109/IIAIAAI55812.2022.00083","DOIUrl":null,"url":null,"abstract":"We focus on the application of multiagent reinforcement learning for supply chain management. The beer game is an example of a problem in supply chain management and has been studied as a cooperation problem in multiagent systems. In the previous study, a method SRDQN that is based on deep reinforcement learning and reward shaping has been applied as a solution to the beer game. In the previous study of SRDQN, a single agent in a game performs reinforcement learning considering other agents to reduce the global cost for inventories of beers. However, it is possible to employ other reward shaping techniques to improve learning stability. It can also be effective in the systems consisting of multiple agents that perform reinforcement learning. We apply a reward shaping technique based on mechanism design to SRDQN to improve the cooperative policies, and then we empirically evaluate the effectiveness of the proposed approach.","PeriodicalId":156230,"journal":{"name":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Applying Reward Design Based on Payment Mechanism to Shaped-Reward DQN for Beer Game\",\"authors\":\"Masaaki Hori, T. Matsui\",\"doi\":\"10.1109/IIAIAAI55812.2022.00083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We focus on the application of multiagent reinforcement learning for supply chain management. The beer game is an example of a problem in supply chain management and has been studied as a cooperation problem in multiagent systems. In the previous study, a method SRDQN that is based on deep reinforcement learning and reward shaping has been applied as a solution to the beer game. In the previous study of SRDQN, a single agent in a game performs reinforcement learning considering other agents to reduce the global cost for inventories of beers. However, it is possible to employ other reward shaping techniques to improve learning stability. It can also be effective in the systems consisting of multiple agents that perform reinforcement learning. We apply a reward shaping technique based on mechanism design to SRDQN to improve the cooperative policies, and then we empirically evaluate the effectiveness of the proposed approach.\",\"PeriodicalId\":156230,\"journal\":{\"name\":\"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IIAIAAI55812.2022.00083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAIAAI55812.2022.00083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applying Reward Design Based on Payment Mechanism to Shaped-Reward DQN for Beer Game
We focus on the application of multiagent reinforcement learning for supply chain management. The beer game is an example of a problem in supply chain management and has been studied as a cooperation problem in multiagent systems. In the previous study, a method SRDQN that is based on deep reinforcement learning and reward shaping has been applied as a solution to the beer game. In the previous study of SRDQN, a single agent in a game performs reinforcement learning considering other agents to reduce the global cost for inventories of beers. However, it is possible to employ other reward shaping techniques to improve learning stability. It can also be effective in the systems consisting of multiple agents that perform reinforcement learning. We apply a reward shaping technique based on mechanism design to SRDQN to improve the cooperative policies, and then we empirically evaluate the effectiveness of the proposed approach.