We Are Legion: High Probability Regret Bound in Adversarial Multiagent Online Learning

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS

IEEE Control Systems Letters Pub Date : 2024-12-18 DOI:10.1109/LCSYS.2024.3519637

Sri Jaladi;Ilai Bistritz

{"title":"We Are Legion: High Probability Regret Bound in Adversarial Multiagent Online Learning","authors":"Sri Jaladi;Ilai Bistritz","doi":"10.1109/LCSYS.2024.3519637","DOIUrl":null,"url":null,"abstract":"We study a large-scale multiagent online learning problem where the number of agents N is significantly larger than the number of arms K. The agents face the same adversarial online learning problem with K arms over T rounds, where the adversary chooses the cost vectors <inline-formula> <tex-math>$\\boldsymbol {l}(1), \\ldots ,\\boldsymbol {l}(T)$ </tex-math></inline-formula> before the game begins. Each round t, each agent n picks an arm <inline-formula> <tex-math>$a_{n}$ </tex-math></inline-formula>(t) and incurs a cost of <inline-formula> <tex-math>$l_{a_{n}(t)}$ </tex-math></inline-formula> (t). Then, at the end of the round, all agents observe the costs of all arms <inline-formula> <tex-math>$l_{1}(t), \\ldots ,l_{K}(t)$ </tex-math></inline-formula>. The exponential weights algorithm achieves an order-wise optimal expected regret of <inline-formula> <tex-math>$O(\\sqrt {T})$ </tex-math></inline-formula> for each agent. However, the variance of the sum of regrets scales linearly with the number of agents, which is unacceptable for a large-scale multi-agent system. To mitigate this, we propose a simple fully distributed algorithm that achieves the same optimal expected sum of regrets but reduces the variance of the sum of regrets from O(N) to <inline-formula> <tex-math>$O(\\min (N,K))$ </tex-math></inline-formula> with no communication required between the agents.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"8 ","pages":"2985-2990"},"PeriodicalIF":2.4000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10806853/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

We study a large-scale multiagent online learning problem where the number of agents N is significantly larger than the number of arms K. The agents face the same adversarial online learning problem with K arms over T rounds, where the adversary chooses the cost vectors

$\boldsymbol {l}(1), \ldots ,\boldsymbol {l}(T)$

before the game begins. Each round t, each agent n picks an arm

$a_{n}$

(t) and incurs a cost of

$l_{a_{n}(t)}$

(t). Then, at the end of the round, all agents observe the costs of all arms

$l_{1}(t), \ldots ,l_{K}(t)$

. The exponential weights algorithm achieves an order-wise optimal expected regret of

$O(\sqrt {T})$

for each agent. However, the variance of the sum of regrets scales linearly with the number of agents, which is unacceptable for a large-scale multi-agent system. To mitigate this, we propose a simple fully distributed algorithm that achieves the same optimal expected sum of regrets but reduces the variance of the sum of regrets from O(N) to

$O(\min (N,K))$

with no communication required between the agents.

查看原文本刊更多论文

我们是军团：对抗性多智能体在线学习中的高概率后悔边界

我们研究了一个大规模的多智能体在线学习问题，其中智能体的数量N明显大于武器K的数量。智能体面临相同的对抗性在线学习问题，在T轮中有K个武器，其中对手在游戏开始前选择成本向量$\boldsymbol {l}(1), \ldots ,\boldsymbol {l}(T)$。在每轮t中，每个智能体n选择一个手臂$a_{n}$ (t)，并产生$l_{a_{n}(t)}$ (t)的成本。然后，在一轮结束时，所有智能体观察所有手臂$l_{1}(t), \ldots ,l_{K}(t)$的成本。指数权重算法实现了每个代理的有序最优期望后悔$O(\sqrt {T})$。然而，遗憾总和的方差与智能体的数量呈线性关系，这对于大规模的多智能体系统来说是不可接受的。为了缓解这种情况，我们提出了一种简单的全分布式算法，该算法实现了相同的最优期望遗憾和，但将遗憾和的方差从0 (N)减少到$O(\min (N,K))$，并且不需要智能体之间的通信。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Control Systems Letters Mathematics-Control and Optimization

CiteScore

4.40

自引率

13.30%

发文量

471