在具有间接互惠性的混合动机游戏中学习公平合作

arXiv - CS - Multiagent Systems Pub Date : 2024-08-08 DOI:arxiv-2408.04549

Martin Smit, Fernando P. Santos

{"title":"在具有间接互惠性的混合动机游戏中学习公平合作","authors":"Martin Smit, Fernando P. Santos","doi":"arxiv-2408.04549","DOIUrl":null,"url":null,"abstract":"Altruistic cooperation is costly yet socially desirable. As a result, agents\nstruggle to learn cooperative policies through independent reinforcement\nlearning (RL). Indirect reciprocity, where agents consider their interaction\npartner's reputation, has been shown to stabilise cooperation in homogeneous,\nidealised populations. However, more realistic settings are comprised of\nheterogeneous agents with different characteristics and group-based social\nidentities. We study cooperation when agents are stratified into two such\ngroups, and allow reputation updates and actions to depend on group\ninformation. We consider two modelling approaches: evolutionary game theory,\nwhere we comprehensively search for social norms (i.e., rules to assign\nreputations) leading to cooperation and fairness; and RL, where we consider how\nthe stochastic dynamics of policy learning affects the analytically identified\nequilibria. We observe that a defecting majority leads the minority group to\ndefect, but not the inverse. Moreover, changing the norms that judge in and\nout-group interactions can steer a system towards either fair or unfair\ncooperation. This is made clearer when moving beyond equilibrium analysis to\nindependent RL agents, where convergence to fair cooperation occurs with a\nnarrower set of norms. Our results highlight that, in heterogeneous populations\nwith reputations, carefully defining interaction norms is fundamental to tackle\nboth dilemmas of cooperation and of fairness.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity\",\"authors\":\"Martin Smit, Fernando P. Santos\",\"doi\":\"arxiv-2408.04549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Altruistic cooperation is costly yet socially desirable. As a result, agents\\nstruggle to learn cooperative policies through independent reinforcement\\nlearning (RL). Indirect reciprocity, where agents consider their interaction\\npartner's reputation, has been shown to stabilise cooperation in homogeneous,\\nidealised populations. However, more realistic settings are comprised of\\nheterogeneous agents with different characteristics and group-based social\\nidentities. We study cooperation when agents are stratified into two such\\ngroups, and allow reputation updates and actions to depend on group\\ninformation. We consider two modelling approaches: evolutionary game theory,\\nwhere we comprehensively search for social norms (i.e., rules to assign\\nreputations) leading to cooperation and fairness; and RL, where we consider how\\nthe stochastic dynamics of policy learning affects the analytically identified\\nequilibria. We observe that a defecting majority leads the minority group to\\ndefect, but not the inverse. Moreover, changing the norms that judge in and\\nout-group interactions can steer a system towards either fair or unfair\\ncooperation. This is made clearer when moving beyond equilibrium analysis to\\nindependent RL agents, where convergence to fair cooperation occurs with a\\nnarrower set of norms. Our results highlight that, in heterogeneous populations\\nwith reputations, carefully defining interaction norms is fundamental to tackle\\nboth dilemmas of cooperation and of fairness.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

利他主义合作代价高昂，但却符合社会需求。因此，行为主体很难通过独立的强化学习（RL）来学习合作政策。间接互惠，即代理考虑其互动伙伴的声誉，已被证明能稳定同质理想化群体中的合作。然而，更现实的环境是由具有不同特征和基于群体的社会身份的异质代理组成的。我们研究了当代理分层为两个这样的群体时的合作，并允许声誉更新和行动取决于群体信息。我们考虑了两种建模方法：一是进化博弈论，即全面寻找导致合作与公平的社会规范（即分配声誉的规则）；二是 RL，即考虑政策学习的随机动态如何影响分析确定的均衡。我们观察到，多数人的叛变会导致少数人的叛变，但反之不会。此外，改变判断群体内和群体外互动的准则，可以引导系统走向公平或不公平的合作。当超越均衡分析，转而分析独立的 RL 代理时，这一点就变得更加清晰了。我们的研究结果突出表明，在有声誉的异质群体中，仔细定义互动规范是解决合作和公平两难问题的基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity

Altruistic cooperation is costly yet socially desirable. As a result, agents struggle to learn cooperative policies through independent reinforcement learning (RL). Indirect reciprocity, where agents consider their interaction partner's reputation, has been shown to stabilise cooperation in homogeneous, idealised populations. However, more realistic settings are comprised of heterogeneous agents with different characteristics and group-based social identities. We study cooperation when agents are stratified into two such groups, and allow reputation updates and actions to depend on group information. We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria. We observe that a defecting majority leads the minority group to defect, but not the inverse. Moreover, changing the norms that judge in and out-group interactions can steer a system towards either fair or unfair cooperation. This is made clearer when moving beyond equilibrium analysis to independent RL agents, where convergence to fair cooperation occurs with a narrower set of norms. Our results highlight that, in heterogeneous populations with reputations, carefully defining interaction norms is fundamental to tackle both dilemmas of cooperation and of fairness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量