粒度q学习自适应提高多智能体囚徒困境中的集体福利

IF 5.6 1区数学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Chaos Solitons & Fractals Pub Date : 2025-06-18 DOI:10.1016/j.chaos.2025.116642

Hsuan-Wei Lee , Yi-Ning Weng

{"title":"粒度q学习自适应提高多智能体囚徒困境中的集体福利","authors":"Hsuan-Wei Lee , Yi-Ning Weng","doi":"10.1016/j.chaos.2025.116642","DOIUrl":null,"url":null,"abstract":"<div><div>Understanding how cooperation emerges and stabilizes in a difficult environment is a core challenge across biology, physics, and the social sciences. We present a reinforcement-learning framework for the Prisoner’s Dilemma Game between the two distinct agent types: Interactive Identity (II) and Interactive Diversity (ID). While II agents compress all neighbor interactions into one strategy update, ID agents assign one strategy to each neighbor, enabling finer-grained strategic adaptation. We systematically sweep dilemma strengths and analyze both homogeneous and heterogeneous network structures to show that ID agents persistently outcompete II agents at sustaining cooperation, especially for moderate temptations to defect. Moreover, in scenarios where agents can shift from II to ID based on relative payoffs, ID learning often invades populations of II learners, though influential hub nodes can impede this transition in heterogeneous networks. Spatiotemporal analyses indicate that ID agents form a strong cluster of cooperation, which prevents defection from spreading. Finally, extrapolating these results to wider moral dimensions, such as honesty, trust, and punishment, can give a rich understanding of how this granular, neighbor-specific learning raises collective welfare within both natural ecosystems and engineered multi-agent systems.</div></div>","PeriodicalId":9764,"journal":{"name":"Chaos Solitons & Fractals","volume":"199 ","pages":"Article 116642"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Granular Q-learning adaptation boosts collective welfare in multi-agent Prisoner’s Dilemma\",\"authors\":\"Hsuan-Wei Lee , Yi-Ning Weng\",\"doi\":\"10.1016/j.chaos.2025.116642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Understanding how cooperation emerges and stabilizes in a difficult environment is a core challenge across biology, physics, and the social sciences. We present a reinforcement-learning framework for the Prisoner’s Dilemma Game between the two distinct agent types: Interactive Identity (II) and Interactive Diversity (ID). While II agents compress all neighbor interactions into one strategy update, ID agents assign one strategy to each neighbor, enabling finer-grained strategic adaptation. We systematically sweep dilemma strengths and analyze both homogeneous and heterogeneous network structures to show that ID agents persistently outcompete II agents at sustaining cooperation, especially for moderate temptations to defect. Moreover, in scenarios where agents can shift from II to ID based on relative payoffs, ID learning often invades populations of II learners, though influential hub nodes can impede this transition in heterogeneous networks. Spatiotemporal analyses indicate that ID agents form a strong cluster of cooperation, which prevents defection from spreading. Finally, extrapolating these results to wider moral dimensions, such as honesty, trust, and punishment, can give a rich understanding of how this granular, neighbor-specific learning raises collective welfare within both natural ecosystems and engineered multi-agent systems.</div></div>\",\"PeriodicalId\":9764,\"journal\":{\"name\":\"Chaos Solitons & Fractals\",\"volume\":\"199 \",\"pages\":\"Article 116642\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chaos Solitons & Fractals\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0960077925006551\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chaos Solitons & Fractals","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0960077925006551","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

理解合作是如何在困难的环境中出现并稳定下来的，是生物学、物理学和社会科学的核心挑战。我们提出了一个囚徒困境博弈的强化学习框架，该博弈涉及两种不同类型的智能体：交互身份（II）和交互多样性（ID）。II代理将所有邻居交互压缩为一个策略更新，而ID代理为每个邻居分配一个策略，从而实现更细粒度的策略适应。我们系统地梳理了困境优势，并分析了同质和异质网络结构，以表明ID代理在维持合作方面持续优于II代理，特别是在适度的背叛诱惑下。此外，在智能体可以根据相对收益从II转向ID的情况下，ID学习通常会入侵II学习者群体，尽管在异构网络中有影响力的枢纽节点会阻碍这种转变。时空分析表明，身份代理形成了一个强大的合作集群，从而阻止了背叛的传播。最后，将这些结果外推到更广泛的道德维度，如诚实、信任和惩罚，可以丰富地理解这种颗粒状的、特定于邻居的学习如何在自然生态系统和工程多主体系统中提高集体福利。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Granular Q-learning adaptation boosts collective welfare in multi-agent Prisoner’s Dilemma

Understanding how cooperation emerges and stabilizes in a difficult environment is a core challenge across biology, physics, and the social sciences. We present a reinforcement-learning framework for the Prisoner’s Dilemma Game between the two distinct agent types: Interactive Identity (II) and Interactive Diversity (ID). While II agents compress all neighbor interactions into one strategy update, ID agents assign one strategy to each neighbor, enabling finer-grained strategic adaptation. We systematically sweep dilemma strengths and analyze both homogeneous and heterogeneous network structures to show that ID agents persistently outcompete II agents at sustaining cooperation, especially for moderate temptations to defect. Moreover, in scenarios where agents can shift from II to ID based on relative payoffs, ID learning often invades populations of II learners, though influential hub nodes can impede this transition in heterogeneous networks. Spatiotemporal analyses indicate that ID agents form a strong cluster of cooperation, which prevents defection from spreading. Finally, extrapolating these results to wider moral dimensions, such as honesty, trust, and punishment, can give a rich understanding of how this granular, neighbor-specific learning raises collective welfare within both natural ecosystems and engineered multi-agent systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chaos Solitons & Fractals 物理-数学跨学科应用

CiteScore

13.20

自引率

10.30%

发文量

1087

审稿时长

9 months

期刊介绍： Chaos, Solitons & Fractals strives to establish itself as a premier journal in the interdisciplinary realm of Nonlinear Science, Non-equilibrium, and Complex Phenomena. It welcomes submissions covering a broad spectrum of topics within this field, including dynamics, non-equilibrium processes in physics, chemistry, and geophysics, complex matter and networks, mathematical models, computational biology, applications to quantum and mesoscopic phenomena, fluctuations and random processes, self-organization, and social phenomena.