Difference rewards policy gradients.

IF 4.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Neural Computing & Applications Pub Date : 2025-01-01 Epub Date: 2022-11-11 DOI:10.1007/s00521-022-07960-5
Jacopo Castellini, Sam Devlin, Frans A Oliehoek, Rahul Savani
{"title":"Difference rewards policy gradients.","authors":"Jacopo Castellini, Sam Devlin, Frans A Oliehoek, Rahul Savani","doi":"10.1007/s00521-022-07960-5","DOIUrl":null,"url":null,"abstract":"<p><p>Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the <i>Q</i>-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 19","pages":"13163-13186"},"PeriodicalIF":4.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204931/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing & Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00521-022-07960-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/11/11 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.

差异奖励政策梯度。
策略梯度方法已成为多智能体强化学习中最流行的算法之一。然而,许多方法都没有解决的一个关键挑战是多代理信用分配:评估代理对整体性能的贡献,这对于学习好的策略至关重要。我们提出了一种名为Dr.Reinforce的新算法,它通过将差异奖励与策略梯度相结合来明确解决这个问题,以便在奖励函数已知的情况下学习分散的策略。通过直接区分奖励函数,Dr.Reinforce避免了与学习q函数相关的困难,这与反事实多代理策略梯度(反事实多代理策略梯度,一种最先进的差异奖励方法)是一样的。对于奖励函数未知的应用程序,我们展示了Dr.Reinforce的一个版本的有效性,它学习了一个额外的奖励网络,用于估计不同的奖励。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neural Computing & Applications
Neural Computing & Applications 工程技术-计算机:人工智能
CiteScore
11.40
自引率
8.30%
发文量
1280
审稿时长
6.9 months
期刊介绍: Neural Computing & Applications is an international journal which publishes original research and other information in the field of practical applications of neural computing and related techniques such as genetic algorithms, fuzzy logic and neuro-fuzzy systems. All items relevant to building practical systems are within its scope, including but not limited to: -adaptive computing- algorithms- applicable neural networks theory- applied statistics- architectures- artificial intelligence- benchmarks- case histories of innovative applications- fuzzy logic- genetic algorithms- hardware implementations- hybrid intelligent systems- intelligent agents- intelligent control systems- intelligent diagnostics- intelligent forecasting- machine learning- neural networks- neuro-fuzzy systems- pattern recognition- performance measures- self-learning systems- software simulations- supervised and unsupervised learning methods- system engineering and integration. Featured contributions fall into several categories: Original Articles, Review Articles, Book Reviews and Announcements.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信