Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2025-07-10 DOI:10.1016/j.artint.2025.104392

Qiaosheng Zhang , Chenjia Bai , Shuyue Hu , Zhen Wang , Xuelong Li

{"title":"Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning","authors":"Qiaosheng Zhang , Chenjia Bai , Shuyue Hu , Zhen Wang , Xuelong Li","doi":"10.1016/j.artint.2025.104392","DOIUrl":null,"url":null,"abstract":"<div><div>This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as <span>MAIDS</span>, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the <em>joint information ratio</em> of the joint policy, and the min-player then minimizes the <em>marginal information ratio</em> with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><msqrt><mrow><mi>K</mi></mrow></msqrt><mo>)</mo></math></span> for <em>K</em> episodes. To reduce the computational load of <span>MAIDS</span>, we develop an improved algorithm called <span>Reg-MAIDS</span>, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm <span>Compressed-MAIDS</span> wherein the learning target is a compressed environment. Finally, we extend <span>Reg-MAIDS</span> to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample-efficient manner.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"348 ","pages":"Article 104392"},"PeriodicalIF":4.6000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225001110","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS). These algorithms draw inspiration from foundational concepts in information theory, and are proven to be sample efficient in MARL settings such as two-player zero-sum Markov games (MGs) and multi-player general-sum MGs. For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. The basic algorithm, referred to as MAIDS, employs an asymmetric learning structure where the max-player first solves a minimax optimization problem based on the joint information ratio of the joint policy, and the min-player then minimizes the marginal information ratio with the max-player's policy fixed. Theoretical analyses show that it achieves a Bayesian regret of

\tilde{O} (\sqrt{K})

for K episodes. To reduce the computational load of MAIDS, we develop an improved algorithm called Reg-MAIDS, which has the same Bayesian regret bound while enjoying less computational complexity. Moreover, by leveraging the flexibility of IDS principle in choosing the learning target, we propose two methods for constructing compressed environments based on rate-distortion theory, upon which we develop an algorithm Compressed-MAIDS wherein the learning target is a compressed environment. Finally, we extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample-efficient manner.

查看原文本刊更多论文

多智能体强化学习中可证明的高效信息导向采样算法

本文设计并分析了一套基于信息导向采样（IDS）原理的多智能体强化学习（MARL）算法。这些算法从信息论的基本概念中获得灵感，并在MARL设置中被证明是样本效率高的，例如双人零和马尔可夫博弈（MGs）和多人一般和MGs。对于情景二人零和博弈，我们提出了三种样本效率算法来学习纳什均衡。其基本算法称为MAIDS，采用非对称学习结构，最大参与者首先根据联合策略的联合信息比解决最小最大优化问题，最小参与者在最大参与者的策略固定的情况下最小化边际信息比。理论分析表明，对于K集，它达到了O ~ (K)的贝叶斯遗憾。为了减少maid的计算量，我们开发了一种改进的Reg-MAIDS算法，该算法具有相同的贝叶斯遗憾界，同时具有更低的计算复杂度。此外，利用IDS原理在选择学习目标方面的灵活性，我们提出了两种基于率失真理论构建压缩环境的方法，并在此基础上开发了一种以压缩环境为学习目标的compressed - maids算法。最后，我们将regg - maids扩展到多玩家一般和博弈中，并证明了它可以以样本效率的方式学习纳什均衡或粗相关均衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.