Self organizing optimization and phase transition in reinforcement learning minority game system

IF 5.3 2区物理与天体物理 Q1 PHYSICS, MULTIDISCIPLINARY

Frontiers of Physics Pub Date : 2024-01-24 DOI:10.1007/s11467-023-1378-z

Si-Ping Zhang, Jia-Qi Dong, Hui-Yu Zhang, Yi-Xuan Lü, Jue Wang, Zi-Gang Huang

{"title":"Self organizing optimization and phase transition in reinforcement learning minority game system","authors":"Si-Ping Zhang, Jia-Qi Dong, Hui-Yu Zhang, Yi-Xuan Lü, Jue Wang, Zi-Gang Huang","doi":"10.1007/s11467-023-1378-z","DOIUrl":null,"url":null,"abstract":"<div><p>Whether the complex game system composed of a large number of artificial intelligence (AI) agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance. In this paper, we address this question by combining the typical theoretical model of resource allocation system, the minority game model, with reinforcement learning. Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm. In particular, we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff, the whole system continues to approach the optimal state under certain parameter combinations, herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference. An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning. In order to further understand the dynamic behavior of agent learning, we define and analyze the conversion path of belief mode, and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system. Finally, we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback–Leibler divergence and give the parameter position where the period-two appears.\n</p><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":573,"journal":{"name":"Frontiers of Physics","volume":"19 4","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Physics","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1007/s11467-023-1378-z","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Whether the complex game system composed of a large number of artificial intelligence (AI) agents empowered with reinforcement learning can produce extremely favorable collective behaviors just through the way of agent self-exploration is a matter of practical importance. In this paper, we address this question by combining the typical theoretical model of resource allocation system, the minority game model, with reinforcement learning. Each individual participating in the game is set to have a certain degree of intelligence based on reinforcement learning algorithm. In particular, we demonstrate that as AI agents gradually becomes familiar with the unknown environment and tries to provide optimal actions to maximize payoff, the whole system continues to approach the optimal state under certain parameter combinations, herding is effectively suppressed by an oscillating collective behavior which is a self-organizing pattern without any external interference. An interesting phenomenon is that a first-order phase transition is revealed based on some numerical results in our multi-agents system with reinforcement learning. In order to further understand the dynamic behavior of agent learning, we define and analyze the conversion path of belief mode, and find that the self-organizing condensation of belief modes appeared for the given trial and error rates in the AI system. Finally, we provide a detection method for period-two oscillation collective pattern emergence based on the Kullback–Leibler divergence and give the parameter position where the period-two appears.

查看原文本刊更多论文

强化学习少数民族游戏系统中的自组织优化和阶段转换

由大量人工智能（AI）代理组成的复杂博弈系统在强化学习的加持下，能否仅通过代理自我探索的方式产生极为有利的集体行为，是一个具有重要现实意义的问题。本文通过将资源分配系统的典型理论模型--少数人博弈模型与强化学习相结合来解决这一问题。基于强化学习算法，每个参与博弈的个体都被设定为具有一定程度的智能。我们特别证明，当人工智能代理逐渐熟悉未知环境并试图提供最优行动以获得最大回报时，整个系统会在特定参数组合下不断接近最优状态，羊群行为会被一种振荡的集体行为有效抑制，而这种振荡的集体行为是一种不受任何外部干扰的自组织模式。一个有趣的现象是，基于强化学习的多代理系统的一些数值结果显示了一阶相变。为了进一步理解代理学习的动态行为，我们定义并分析了信念模式的转换路径，发现在给定的试错率下，人工智能系统出现了信念模式的自组织凝聚。最后，我们提供了一种基于库尔贝-莱布勒发散的周期-2振荡集体模式出现的检测方法，并给出了周期-2出现的参数位置。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers of Physics PHYSICS, MULTIDISCIPLINARY-

CiteScore

9.20

自引率

9.30%

发文量

898

审稿时长

6-12 weeks

期刊介绍： Frontiers of Physics is an international peer-reviewed journal dedicated to showcasing the latest advancements and significant progress in various research areas within the field of physics. The journal's scope is broad, covering a range of topics that include: Quantum computation and quantum information Atomic, molecular, and optical physics Condensed matter physics, material sciences, and interdisciplinary research Particle, nuclear physics, astrophysics, and cosmology The journal's mission is to highlight frontier achievements, hot topics, and cross-disciplinary points in physics, facilitating communication and idea exchange among physicists both in China and internationally. It serves as a platform for researchers to share their findings and insights, fostering collaboration and innovation across different areas of physics.