Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Qiaosheng Zhang , Chenjia Bai , Shuyue Hu , Zhen Wang , Xuelong Li
{"title":"Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning","authors":"Qiaosheng Zhang ,&nbsp;Chenjia Bai ,&nbsp;Shuyue Hu ,&nbsp;Zhen Wang ,&nbsp;Xuelong Li","doi":"10.1016/j.artint.2025.104392","DOIUrl":null,"url":null,"abstract":"<div><div>This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as <span>MAIDS</span>, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the <em>joint information ratio</em> of the joint policy, and the min-player then minimizes the <em>marginal information ratio</em> with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msqrt><mrow><mi>K</mi></mrow></msqrt><mo>)</mo></math></span> for <em>K</em> episodes. To reduce the computational load of <span>MAIDS</span>, we develop an improved algorithm called <span>Reg-MAIDS</span>, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm <span>Compressed-MAIDS</span> wherein the learning target is a compressed environment. Finally, we extend <span>Reg-MAIDS</span> to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample-efficient manner.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"348 ","pages":"Article 104392"},"PeriodicalIF":5.1000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225001110","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of O˜(K) for K episodes. To reduce the computational load of MAIDS, we develop an improved algorithm called Reg-MAIDS, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm Compressed-MAIDS wherein the learning target is a compressed environment. Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample-efficient manner.
多智能体强化学习中可证明的高效信息导向采样算法
本文设计并分析了一套基于信息导向采样(IDS)原理的多智能体强化学习(MARL)算法。这些算法从信息论的基本概念中获得灵感,并在MARL设置中被证明是样本效率高的,例如双人零和马尔可夫博弈(MGs)和多人一般和MGs。对于情景二人零和博弈,我们提出了三种样本效率算法来学习纳什均衡。其基本算法称为MAIDS,采用非对称学习结构,最大参与者首先根据联合策略的联合信息比解决最小最大优化问题,最小参与者在最大参与者的策略固定的情况下最小化边际信息比。理论分析表明,对于K集,它达到了O ~ (K)的贝叶斯遗憾。为了减少maid的计算量,我们开发了一种改进的Reg-MAIDS算法,该算法具有相同的贝叶斯遗憾界,同时具有更低的计算复杂度。此外,利用IDS原理在选择学习目标方面的灵活性,我们提出了两种基于率失真理论构建压缩环境的方法,并在此基础上开发了一种以压缩环境为学习目标的compressed - maids算法。最后,我们将regg - maids扩展到多玩家一般和博弈中,并证明了它可以以样本效率的方式学习纳什均衡或粗相关均衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial Intelligence
Artificial Intelligence 工程技术-计算机:人工智能
CiteScore
11.20
自引率
1.40%
发文量
118
审稿时长
8 months
期刊介绍: The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信