The Role of Reinforcement Learning in the Emergence of Conventions: Simulation Experiments with the Repeated Volunteer's Dilemma

H. Nunner, W. Przepiorka, Chris Janssen
{"title":"The Role of Reinforcement Learning in the Emergence of Conventions: Simulation Experiments with the Repeated Volunteer's Dilemma","authors":"H. Nunner, W. Przepiorka, Chris Janssen","doi":"10.18564/jasss.4771","DOIUrl":null,"url":null,"abstract":"We use reinforcement learning models to investigate the role of cognitive mechanisms in the emergence of conventions in the repeated volunteer’s dilemma (VOD). The VOD is amulti-person, binary choice collective goods game in which the contribution of only one individual is necessary and su icient to produce a benefit for the entire group. Behavioral experiments show that in the symmetric VOD,where all groupmembers have the same costs of volunteering, a turn-taking convention emerges, whereas in the asymmetric VOD,where one “strong” group member has lower costs of volunteering, a solitary-volunteering convention emerges with the strong member volunteering most of the time. We compare three di erent classes of reinforcement learningmodels in their ability to replicate these empirical findings. Our results confirm that reinforcement learning models canprovide aparsimonious account of howhumans tacitly agreeonone course of actionwhenencountering each other repeatedly in the same interaction situation. We find that considering contextual clues (i.e., reward structures) for strategy design (i.e., sequences of actions) and strategy selection (i.e., favoring equal distribution of costs) facilitate coordinationwhenoptimaare less salient. Furthermore, ourmodels producebetter fits with the empirical datawhen agents actmyopically (favoring current over expected future rewards) and the rewards for adhering to conventions are not delayed.","PeriodicalId":14675,"journal":{"name":"J. Artif. Soc. Soc. Simul.","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Artif. Soc. Soc. Simul.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18564/jasss.4771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We use reinforcement learning models to investigate the role of cognitive mechanisms in the emergence of conventions in the repeated volunteer’s dilemma (VOD). The VOD is amulti-person, binary choice collective goods game in which the contribution of only one individual is necessary and su icient to produce a benefit for the entire group. Behavioral experiments show that in the symmetric VOD,where all groupmembers have the same costs of volunteering, a turn-taking convention emerges, whereas in the asymmetric VOD,where one “strong” group member has lower costs of volunteering, a solitary-volunteering convention emerges with the strong member volunteering most of the time. We compare three di erent classes of reinforcement learningmodels in their ability to replicate these empirical findings. Our results confirm that reinforcement learning models canprovide aparsimonious account of howhumans tacitly agreeonone course of actionwhenencountering each other repeatedly in the same interaction situation. We find that considering contextual clues (i.e., reward structures) for strategy design (i.e., sequences of actions) and strategy selection (i.e., favoring equal distribution of costs) facilitate coordinationwhenoptimaare less salient. Furthermore, ourmodels producebetter fits with the empirical datawhen agents actmyopically (favoring current over expected future rewards) and the rewards for adhering to conventions are not delayed.
强化学习在约定产生中的作用:重复志愿者困境的模拟实验
我们使用强化学习模型来研究认知机制在重复志愿者困境(VOD)中约定产生中的作用。VOD是一种多人、二元选择的集体商品博弈,在这种博弈中,只有一个人的贡献是必要的,足以为整个群体产生利益。行为实验表明,在对称的视频点播中,当所有成员的志愿服务成本相同时,就会出现轮流约定;而在非对称的视频点播中,当一个“强”成员的志愿服务成本较低时,就会出现一个“强”成员大多数时间都志愿服务的“孤独-志愿约定”。我们比较了三种不同类型的强化学习模型复制这些实证结果的能力。我们的研究结果证实,强化学习模型可以为人类在相同的互动情况下反复遇到彼此时如何默认一个行动过程提供简洁的解释。我们发现,考虑策略设计(即行动序列)和策略选择(即倾向于成本的平均分配)的上下文线索(即奖励结构)有助于在优化不太突出时进行协调。此外,当代理行为短视(偏好当前而非预期的未来奖励)并且遵守约定的奖励不会延迟时,我们的模型与经验数据的拟合更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信