Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity

Martin Smit, Fernando P. Santos
{"title":"Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity","authors":"Martin Smit, Fernando P. Santos","doi":"arxiv-2408.04549","DOIUrl":null,"url":null,"abstract":"Altruistic cooperation is costly yet socially desirable. As a result, agents\nstruggle to learn cooperative policies through independent reinforcement\nlearning (RL). Indirect reciprocity, where agents consider their interaction\npartner's reputation, has been shown to stabilise cooperation in homogeneous,\nidealised populations. However, more realistic settings are comprised of\nheterogeneous agents with different characteristics and group-based social\nidentities. We study cooperation when agents are stratified into two such\ngroups, and allow reputation updates and actions to depend on group\ninformation. We consider two modelling approaches: evolutionary game theory,\nwhere we comprehensively search for social norms (i.e., rules to assign\nreputations) leading to cooperation and fairness; and RL, where we consider how\nthe stochastic dynamics of policy learning affects the analytically identified\nequilibria. We observe that a defecting majority leads the minority group to\ndefect, but not the inverse. Moreover, changing the norms that judge in and\nout-group interactions can steer a system towards either fair or unfair\ncooperation. This is made clearer when moving beyond equilibrium analysis to\nindependent RL agents, where convergence to fair cooperation occurs with a\nnarrower set of norms. Our results highlight that, in heterogeneous populations\nwith reputations, carefully defining interaction norms is fundamental to tackle\nboth dilemmas of cooperation and of fairness.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Altruistic cooperation is costly yet socially desirable. As a result, agents struggle to learn cooperative policies through independent reinforcement learning (RL). Indirect reciprocity, where agents consider their interaction partner's reputation, has been shown to stabilise cooperation in homogeneous, idealised populations. However, more realistic settings are comprised of heterogeneous agents with different characteristics and group-based social identities. We study cooperation when agents are stratified into two such groups, and allow reputation updates and actions to depend on group information. We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria. We observe that a defecting majority leads the minority group to defect, but not the inverse. Moreover, changing the norms that judge in and out-group interactions can steer a system towards either fair or unfair cooperation. This is made clearer when moving beyond equilibrium analysis to independent RL agents, where convergence to fair cooperation occurs with a narrower set of norms. Our results highlight that, in heterogeneous populations with reputations, carefully defining interaction norms is fundamental to tackle both dilemmas of cooperation and of fairness.
在具有间接互惠性的混合动机游戏中学习公平合作
利他主义合作代价高昂,但却符合社会需求。因此,行为主体很难通过独立的强化学习(RL)来学习合作政策。间接互惠,即代理考虑其互动伙伴的声誉,已被证明能稳定同质理想化群体中的合作。然而,更现实的环境是由具有不同特征和基于群体的社会身份的异质代理组成的。我们研究了当代理分层为两个这样的群体时的合作,并允许声誉更新和行动取决于群体信息。我们考虑了两种建模方法:一是进化博弈论,即全面寻找导致合作与公平的社会规范(即分配声誉的规则);二是 RL,即考虑政策学习的随机动态如何影响分析确定的均衡。我们观察到,多数人的叛变会导致少数人的叛变,但反之不会。此外,改变判断群体内和群体外互动的准则,可以引导系统走向公平或不公平的合作。当超越均衡分析,转而分析独立的 RL 代理时,这一点就变得更加清晰了。我们的研究结果突出表明,在有声誉的异质群体中,仔细定义互动规范是解决合作和公平两难问题的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信