差异奖励政策梯度。

IF 4.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computing & Applications Pub Date : 2025-01-01 Epub Date: 2022-11-11 DOI:10.1007/s00521-022-07960-5

Jacopo Castellini, Sam Devlin, Frans A Oliehoek, Rahul Savani

{"title":"差异奖励政策梯度。","authors":"Jacopo Castellini, Sam Devlin, Frans A Oliehoek, Rahul Savani","doi":"10.1007/s00521-022-07960-5","DOIUrl":null,"url":null,"abstract":"Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 19","pages":"13163-13186"},"PeriodicalIF":4.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204931/pdf/","citationCount":"0","resultStr":"{\"title\":\"Difference rewards policy gradients.\",\"authors\":\"Jacopo Castellini, Sam Devlin, Frans A Oliehoek, Rahul Savani\",\"doi\":\"10.1007/s00521-022-07960-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.\",\"PeriodicalId\":49766,\"journal\":{\"name\":\"Neural Computing & Applications\",\"volume\":\"37 19\",\"pages\":\"13163-13186\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204931/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computing & Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00521-022-07960-5\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/11/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing & Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00521-022-07960-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/11/11 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

策略梯度方法已成为多智能体强化学习中最流行的算法之一。然而，许多方法都没有解决的一个关键挑战是多代理信用分配：评估代理对整体性能的贡献，这对于学习好的策略至关重要。我们提出了一种名为Dr.Reinforce的新算法，它通过将差异奖励与策略梯度相结合来明确解决这个问题，以便在奖励函数已知的情况下学习分散的策略。通过直接区分奖励函数，Dr.Reinforce避免了与学习q函数相关的困难，这与反事实多代理策略梯度（反事实多代理策略梯度，一种最先进的差异奖励方法）是一样的。对于奖励函数未知的应用程序，我们展示了Dr.Reinforce的一个版本的有效性，它学习了一个额外的奖励网络，用于估计不同的奖励。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Difference rewards policy gradients.

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computing & Applications 工程技术-计算机：人工智能

CiteScore

11.40

自引率

8.30%

发文量

1280

审稿时长

6.9 months

期刊介绍： Neural Computing & Applications is an international journal which publishes original research and other information in the field of practical applications of neural computing and related techniques such as genetic algorithms, fuzzy logic and neuro-fuzzy systems. All items relevant to building practical systems are within its scope, including but not limited to: -adaptive computing- algorithms- applicable neural networks theory- applied statistics- architectures- artificial intelligence- benchmarks- case histories of innovative applications- fuzzy logic- genetic algorithms- hardware implementations- hybrid intelligent systems- intelligent agents- intelligent control systems- intelligent diagnostics- intelligent forecasting- machine learning- neural networks- neuro-fuzzy systems- pattern recognition- performance measures- self-learning systems- software simulations- supervised and unsupervised learning methods- system engineering and integration. Featured contributions fall into several categories: Original Articles, Review Articles, Book Reviews and Announcements.