不可学习的游戏与 "满意 "决策：复杂世界的简单模型

IF 11.6 1区物理与天体物理 Q1 PHYSICS, MULTIDISCIPLINARY

Physical Review X Pub Date : 2024-06-06 DOI:10.1103/physrevx.14.021039

Jérôme Garnier-Brun, Michael Benzaquen, Jean-Philippe Bouchaud

{"title":"不可学习的游戏与 \"满意 \"决策：复杂世界的简单模型","authors":"Jérôme Garnier-Brun, Michael Benzaquen, Jean-Philippe Bouchaud","doi":"10.1103/physrevx.14.021039","DOIUrl":null,"url":null,"abstract":"As a schematic model of the complexity economic agents are confronted with, we introduce the “Sherrington-Kirkpatrick game,” a discrete time binary choice model inspired from mean-field spin glasses. We show that, even in a completely static environment, agents are unable to learn collectively optimal strategies. This is either because the learning process gets trapped in a suboptimal fixed point or because learning never converges and leads to a never-ending evolution of agent intentions. Contrarily to the hope that learning might save the standard “rational expectation” framework in economics, we argue that complex situations are generically unlearnable and agents must do with satisficing solutions, as argued long ago by Simon [Q. J. Econ. 69, 99 (1955)]. Only a centralized, omniscient agent endowed with enormous computing power could qualify to determine the optimal strategy of all agents. Using a mix of analytical arguments and numerical simulations, we find that (i) long memory of past rewards is beneficial to learning, whereas overreaction to recent past is detrimental and leads to cycles or chaos; (ii) increased competition (nonreciprocity) destabilizes fixed points and leads first to chaos and, in the high competition limit, to quasicycles; (iii) some amount of randomness in the learning process, perhaps paradoxically, allows the system to reach better collective decisions; (iv) nonstationary, “aging” behavior spontaneously emerges in a large swath of parameter space of our complex but static world. On the positive side, we find that the learning process allows cooperative systems to coordinate around satisficing solutions with rather high (but markedly suboptimal) average reward. However, hypersensitivity to the game parameters makes it impossible to predict ex ante who will be better or worse off in our stylized economy. The statistical description of the space of satisficing solutions is an open problem.","PeriodicalId":20161,"journal":{"name":"Physical Review X","volume":"30 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unlearnable Games and “Satisficing” Decisions: A Simple Model for a Complex World\",\"authors\":\"Jérôme Garnier-Brun, Michael Benzaquen, Jean-Philippe Bouchaud\",\"doi\":\"10.1103/physrevx.14.021039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a schematic model of the complexity economic agents are confronted with, we introduce the “Sherrington-Kirkpatrick game,” a discrete time binary choice model inspired from mean-field spin glasses. We show that, even in a completely static environment, agents are unable to learn collectively optimal strategies. This is either because the learning process gets trapped in a suboptimal fixed point or because learning never converges and leads to a never-ending evolution of agent intentions. Contrarily to the hope that learning might save the standard “rational expectation” framework in economics, we argue that complex situations are generically unlearnable and agents must do with satisficing solutions, as argued long ago by Simon [Q. J. Econ. 69, 99 (1955)]. Only a centralized, omniscient agent endowed with enormous computing power could qualify to determine the optimal strategy of all agents. Using a mix of analytical arguments and numerical simulations, we find that (i) long memory of past rewards is beneficial to learning, whereas overreaction to recent past is detrimental and leads to cycles or chaos; (ii) increased competition (nonreciprocity) destabilizes fixed points and leads first to chaos and, in the high competition limit, to quasicycles; (iii) some amount of randomness in the learning process, perhaps paradoxically, allows the system to reach better collective decisions; (iv) nonstationary, “aging” behavior spontaneously emerges in a large swath of parameter space of our complex but static world. On the positive side, we find that the learning process allows cooperative systems to coordinate around satisficing solutions with rather high (but markedly suboptimal) average reward. However, hypersensitivity to the game parameters makes it impossible to predict ex ante who will be better or worse off in our stylized economy. The statistical description of the space of satisficing solutions is an open problem.\",\"PeriodicalId\":20161,\"journal\":{\"name\":\"Physical Review X\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":11.6000,\"publicationDate\":\"2024-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Physical Review X\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1103/physrevx.14.021039\",\"RegionNum\":1,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Review X","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1103/physrevx.14.021039","RegionNum":1,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

作为经济行为主体所面临的复杂性的示意模型，我们引入了 "谢林顿-柯克帕特里克博弈"，这是一个离散时间二元选择模型，其灵感来自均场自旋眼镜。我们的研究表明，即使在完全静态的环境中，代理人也无法集体学习最优策略。这要么是因为学习过程被困在一个次优的固定点上，要么是因为学习永远不会收敛，导致代理意图永无止境地演化。与学习可能挽救经济学中标准的 "理性预期 "框架的希望相反，我们认为，复杂的情况一般来说是无法学习的，代理人必须采取满足的解决方案，这一点西蒙早就提出过[Q. J. Econ. 69, 99 (1955)]。只有中央集权、无所不知、拥有强大计算能力的代理人才有资格决定所有代理人的最优策略。通过分析论证和数值模拟，我们发现：(i) 对过去回报的长期记忆有利于学习，而对近期回报的过度反应则不利于学习，会导致循环或混乱；(ii) 竞争的加剧（非互惠性）会破坏定点的稳定性，首先导致混乱，在高竞争极限下，会导致准循环；(iii) 学习过程中存在一定的随机性，也许矛盾的是，这种随机性能使系统做出更好的集体决策；(iv) 在我们这个复杂但静态的世界中，参数空间的大片区域自发地出现了非稳态的 "老化 "行为。从积极的方面来看，我们发现学习过程允许合作系统围绕平均回报相当高（但明显低于最优）的满意解决方案进行协调。然而，由于对博弈参数的超敏感性，我们无法事先预测谁会在我们的风格化经济中获得更好或更坏的收益。对满意方案空间的统计描述是一个悬而未决的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Unlearnable Games and “Satisficing” Decisions: A Simple Model for a Complex World

查看原文本刊更多论文

Unlearnable Games and “Satisficing” Decisions: A Simple Model for a Complex World

As a schematic model of the complexity economic agents are confronted with, we introduce the “Sherrington-Kirkpatrick game,” a discrete time binary choice model inspired from mean-field spin glasses. We show that, even in a completely static environment, agents are unable to learn collectively optimal strategies. This is either because the learning process gets trapped in a suboptimal fixed point or because learning never converges and leads to a never-ending evolution of agent intentions. Contrarily to the hope that learning might save the standard “rational expectation” framework in economics, we argue that complex situations are generically unlearnable and agents must do with satisficing solutions, as argued long ago by Simon [Q. J. Econ. 69, 99 (1955)]. Only a centralized, omniscient agent endowed with enormous computing power could qualify to determine the optimal strategy of all agents. Using a mix of analytical arguments and numerical simulations, we find that (i) long memory of past rewards is beneficial to learning, whereas overreaction to recent past is detrimental and leads to cycles or chaos; (ii) increased competition (nonreciprocity) destabilizes fixed points and leads first to chaos and, in the high competition limit, to quasicycles; (iii) some amount of randomness in the learning process, perhaps paradoxically, allows the system to reach better collective decisions; (iv) nonstationary, “aging” behavior spontaneously emerges in a large swath of parameter space of our complex but static world. On the positive side, we find that the learning process allows cooperative systems to coordinate around satisficing solutions with rather high (but markedly suboptimal) average reward. However, hypersensitivity to the game parameters makes it impossible to predict ex ante who will be better or worse off in our stylized economy. The statistical description of the space of satisficing solutions is an open problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Physical Review X PHYSICS, MULTIDISCIPLINARY-

CiteScore

24.60

自引率

1.60%

发文量

197

审稿时长

3 months

期刊介绍： Physical Review X (PRX) stands as an exclusively online, fully open-access journal, emphasizing innovation, quality, and enduring impact in the scientific content it disseminates. Devoted to showcasing a curated selection of papers from pure, applied, and interdisciplinary physics, PRX aims to feature work with the potential to shape current and future research while leaving a lasting and profound impact in their respective fields. Encompassing the entire spectrum of physics subject areas, PRX places a special focus on groundbreaking interdisciplinary research with broad-reaching influence.