Shiyu Huang, Chao Yu, Bin Wang, Dong Li, Yu Wang, Tingling Chen, Jun Zhu
{"title":"基于循环轨迹判别器的多智能体博弈的多元解","authors":"Shiyu Huang, Chao Yu, Bin Wang, Dong Li, Yu Wang, Tingling Chen, Jun Zhu","doi":"10.1109/CoG51982.2022.9893722","DOIUrl":null,"url":null,"abstract":"Recent algorithms designed for multi-agent tasks focus on finding a single optimal solution for all the agents. However, in many tasks (e.g., matrix games and transportation dispatching), there may exist more than one optimal solution, while previous algorithms can only converge to one of them. In many practical applications, it is important to develop reasonable agents with diverse behaviors. In this paper, we propose ”variational multi-agent policy diversification” (VMAPD), an on-policy framework for discovering diverse policies for coordination patterns of multiple agents. By taking advantage of latent variables and exploiting the connection between variational inference and multi-agent reinforcement learning, we derive a tractable evidence lower bound (ELBO) on the trajectories of all agents. Our algorithm uses policy iteration to maximize the derived lower bound and can be simply implemented by adding a pseudo reward during centralized learning. And the trained agents do not need to access the pseudo reward during decentralized execution. We demonstrate the effectiveness of our algorithm on several popular multi-agent testbeds. Experimental results show that VMAPD finds more solutions with similar sample complexity compared with other baselines.","PeriodicalId":394281,"journal":{"name":"2022 IEEE Conference on Games (CoG)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"VMAPD: Generate Diverse Solutions for Multi-Agent Games with Recurrent Trajectory Discriminators\",\"authors\":\"Shiyu Huang, Chao Yu, Bin Wang, Dong Li, Yu Wang, Tingling Chen, Jun Zhu\",\"doi\":\"10.1109/CoG51982.2022.9893722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent algorithms designed for multi-agent tasks focus on finding a single optimal solution for all the agents. However, in many tasks (e.g., matrix games and transportation dispatching), there may exist more than one optimal solution, while previous algorithms can only converge to one of them. In many practical applications, it is important to develop reasonable agents with diverse behaviors. In this paper, we propose ”variational multi-agent policy diversification” (VMAPD), an on-policy framework for discovering diverse policies for coordination patterns of multiple agents. By taking advantage of latent variables and exploiting the connection between variational inference and multi-agent reinforcement learning, we derive a tractable evidence lower bound (ELBO) on the trajectories of all agents. Our algorithm uses policy iteration to maximize the derived lower bound and can be simply implemented by adding a pseudo reward during centralized learning. And the trained agents do not need to access the pseudo reward during decentralized execution. We demonstrate the effectiveness of our algorithm on several popular multi-agent testbeds. Experimental results show that VMAPD finds more solutions with similar sample complexity compared with other baselines.\",\"PeriodicalId\":394281,\"journal\":{\"name\":\"2022 IEEE Conference on Games (CoG)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Conference on Games (CoG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CoG51982.2022.9893722\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Conference on Games (CoG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoG51982.2022.9893722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
VMAPD: Generate Diverse Solutions for Multi-Agent Games with Recurrent Trajectory Discriminators
Recent algorithms designed for multi-agent tasks focus on finding a single optimal solution for all the agents. However, in many tasks (e.g., matrix games and transportation dispatching), there may exist more than one optimal solution, while previous algorithms can only converge to one of them. In many practical applications, it is important to develop reasonable agents with diverse behaviors. In this paper, we propose ”variational multi-agent policy diversification” (VMAPD), an on-policy framework for discovering diverse policies for coordination patterns of multiple agents. By taking advantage of latent variables and exploiting the connection between variational inference and multi-agent reinforcement learning, we derive a tractable evidence lower bound (ELBO) on the trajectories of all agents. Our algorithm uses policy iteration to maximize the derived lower bound and can be simply implemented by adding a pseudo reward during centralized learning. And the trained agents do not need to access the pseudo reward during decentralized execution. We demonstrate the effectiveness of our algorithm on several popular multi-agent testbeds. Experimental results show that VMAPD finds more solutions with similar sample complexity compared with other baselines.