{"title":"多代理强化学习的异步学分分配框架","authors":"Yongheng Liang, Hejun Wu, Haitao Wang, Hao Cai","doi":"arxiv-2408.03692","DOIUrl":null,"url":null,"abstract":"Credit assignment is a core problem that distinguishes agents' marginal\ncontributions for optimizing cooperative strategies in multi-agent\nreinforcement learning (MARL). Current credit assignment methods usually assume\nsynchronous decision-making among agents. However, a prerequisite for many\nrealistic cooperative tasks is asynchronous decision-making by agents, without\nwaiting for others to avoid disastrous consequences. To address this issue, we\npropose an asynchronous credit assignment framework with a problem model called\nADEX-POMDP and a multiplicative value decomposition (MVD) algorithm. ADEX-POMDP\nis an asynchronous problem model with extra virtual agents for a decentralized\npartially observable markov decision process. We prove that ADEX-POMDP\npreserves both the task equilibrium and the algorithm convergence. MVD utilizes\nmultiplicative interaction to efficiently capture the interactions of\nasynchronous decisions, and we theoretically demonstrate its advantages in\nhandling asynchronous tasks. Experimental results show that on two asynchronous\ndecision-making benchmarks, Overcooked and POAC, MVD not only consistently\noutperforms state-of-the-art MARL methods but also provides the\ninterpretability for asynchronous cooperation.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning\",\"authors\":\"Yongheng Liang, Hejun Wu, Haitao Wang, Hao Cai\",\"doi\":\"arxiv-2408.03692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Credit assignment is a core problem that distinguishes agents' marginal\\ncontributions for optimizing cooperative strategies in multi-agent\\nreinforcement learning (MARL). Current credit assignment methods usually assume\\nsynchronous decision-making among agents. However, a prerequisite for many\\nrealistic cooperative tasks is asynchronous decision-making by agents, without\\nwaiting for others to avoid disastrous consequences. To address this issue, we\\npropose an asynchronous credit assignment framework with a problem model called\\nADEX-POMDP and a multiplicative value decomposition (MVD) algorithm. ADEX-POMDP\\nis an asynchronous problem model with extra virtual agents for a decentralized\\npartially observable markov decision process. We prove that ADEX-POMDP\\npreserves both the task equilibrium and the algorithm convergence. MVD utilizes\\nmultiplicative interaction to efficiently capture the interactions of\\nasynchronous decisions, and we theoretically demonstrate its advantages in\\nhandling asynchronous tasks. Experimental results show that on two asynchronous\\ndecision-making benchmarks, Overcooked and POAC, MVD not only consistently\\noutperforms state-of-the-art MARL methods but also provides the\\ninterpretability for asynchronous cooperation.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"39 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.03692\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Asynchronous Credit Assignment Framework for Multi-Agent Reinforcement Learning
Credit assignment is a core problem that distinguishes agents' marginal
contributions for optimizing cooperative strategies in multi-agent
reinforcement learning (MARL). Current credit assignment methods usually assume
synchronous decision-making among agents. However, a prerequisite for many
realistic cooperative tasks is asynchronous decision-making by agents, without
waiting for others to avoid disastrous consequences. To address this issue, we
propose an asynchronous credit assignment framework with a problem model called
ADEX-POMDP and a multiplicative value decomposition (MVD) algorithm. ADEX-POMDP
is an asynchronous problem model with extra virtual agents for a decentralized
partially observable markov decision process. We prove that ADEX-POMDP
preserves both the task equilibrium and the algorithm convergence. MVD utilizes
multiplicative interaction to efficiently capture the interactions of
asynchronous decisions, and we theoretically demonstrate its advantages in
handling asynchronous tasks. Experimental results show that on two asynchronous
decision-making benchmarks, Overcooked and POAC, MVD not only consistently
outperforms state-of-the-art MARL methods but also provides the
interpretability for asynchronous cooperation.