{"title":"Cooperative Multi-agent Inverse Reinforcement Learning Based on Selfish Expert and its Behavior Archives","authors":"Yukiko Fukumoto, Masakazu Tadokoro, K. Takadama","doi":"10.1109/SSCI47803.2020.9308491","DOIUrl":null,"url":null,"abstract":"This paper explores the multi-agent inverse reinforcement learning (MAIRL) method which enables the agents to acquire their cooperative behaviors based on selfish expert behaviors (i.e., it is generated from the viewpoint of a single agent). Since such selfish expert behaviors may not derive cooperative behaviors among agents, this paper tackles this problem by archiving the cooperative behaviors found in the learning process and by replacing the original expert behaviors with the archived one at a certain interval. For this issue, this paper proposes AMAIRL (Archive Multi-Agent Inverse Reinforcement Learning). Through the intensive simulations of the maze problem for our method, the following implications have been revealed: (1) AMAIRL is superior to MaxEntIRL in terms of finding cooperative behavior; (2) AMAIRL requires a long interval period to acquire the cooperative behaviors. In particular, AMAIRL with the long interval can find the cooperative behaviors that are hard to be found in AMAIRL with the short interval.","PeriodicalId":413489,"journal":{"name":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI47803.2020.9308491","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper explores the multi-agent inverse reinforcement learning (MAIRL) method which enables the agents to acquire their cooperative behaviors based on selfish expert behaviors (i.e., it is generated from the viewpoint of a single agent). Since such selfish expert behaviors may not derive cooperative behaviors among agents, this paper tackles this problem by archiving the cooperative behaviors found in the learning process and by replacing the original expert behaviors with the archived one at a certain interval. For this issue, this paper proposes AMAIRL (Archive Multi-Agent Inverse Reinforcement Learning). Through the intensive simulations of the maze problem for our method, the following implications have been revealed: (1) AMAIRL is superior to MaxEntIRL in terms of finding cooperative behavior; (2) AMAIRL requires a long interval period to acquire the cooperative behaviors. In particular, AMAIRL with the long interval can find the cooperative behaviors that are hard to be found in AMAIRL with the short interval.