We Are Legion: High Probability Regret Bound in Adversarial Multiagent Online Learning

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS
Sri Jaladi;Ilai Bistritz
{"title":"We Are Legion: High Probability Regret Bound in Adversarial Multiagent Online Learning","authors":"Sri Jaladi;Ilai Bistritz","doi":"10.1109/LCSYS.2024.3519637","DOIUrl":null,"url":null,"abstract":"We study a large-scale multiagent online learning problem where the number of agents N is significantly larger than the number of arms K. The agents face the same adversarial online learning problem with K arms over T rounds, where the adversary chooses the cost vectors <inline-formula> <tex-math>$\\boldsymbol {l}(1), \\ldots ,\\boldsymbol {l}(T)$ </tex-math></inline-formula> before the game begins. Each round t, each agent n picks an arm <inline-formula> <tex-math>$a_{n}$ </tex-math></inline-formula>(t) and incurs a cost of <inline-formula> <tex-math>$l_{a_{n}(t)}$ </tex-math></inline-formula> (t). Then, at the end of the round, all agents observe the costs of all arms <inline-formula> <tex-math>$l_{1}(t), \\ldots ,l_{K}(t)$ </tex-math></inline-formula>. The exponential weights algorithm achieves an order-wise optimal expected regret of <inline-formula> <tex-math>$O(\\sqrt {T})$ </tex-math></inline-formula> for each agent. However, the variance of the sum of regrets scales linearly with the number of agents, which is unacceptable for a large-scale multi-agent system. To mitigate this, we propose a simple fully distributed algorithm that achieves the same optimal expected sum of regrets but reduces the variance of the sum of regrets from O(N) to <inline-formula> <tex-math>$O(\\min (N,K))$ </tex-math></inline-formula> with no communication required between the agents.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"8 ","pages":"2985-2990"},"PeriodicalIF":2.4000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10806853/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

We study a large-scale multiagent online learning problem where the number of agents N is significantly larger than the number of arms K. The agents face the same adversarial online learning problem with K arms over T rounds, where the adversary chooses the cost vectors $\boldsymbol {l}(1), \ldots ,\boldsymbol {l}(T)$ before the game begins. Each round t, each agent n picks an arm $a_{n}$ (t) and incurs a cost of $l_{a_{n}(t)}$ (t). Then, at the end of the round, all agents observe the costs of all arms $l_{1}(t), \ldots ,l_{K}(t)$ . The exponential weights algorithm achieves an order-wise optimal expected regret of $O(\sqrt {T})$ for each agent. However, the variance of the sum of regrets scales linearly with the number of agents, which is unacceptable for a large-scale multi-agent system. To mitigate this, we propose a simple fully distributed algorithm that achieves the same optimal expected sum of regrets but reduces the variance of the sum of regrets from O(N) to $O(\min (N,K))$ with no communication required between the agents.
我们是军团:对抗性多智能体在线学习中的高概率后悔边界
我们研究了一个大规模的多智能体在线学习问题,其中智能体的数量N明显大于武器K的数量。智能体面临相同的对抗性在线学习问题,在T轮中有K个武器,其中对手在游戏开始前选择成本向量$\boldsymbol {l}(1), \ldots ,\boldsymbol {l}(T)$。在每轮t中,每个智能体n选择一个手臂$a_{n}$ (t),并产生$l_{a_{n}(t)}$ (t)的成本。然后,在一轮结束时,所有智能体观察所有手臂$l_{1}(t), \ldots ,l_{K}(t)$的成本。指数权重算法实现了每个代理的有序最优期望后悔$O(\sqrt {T})$。然而,遗憾总和的方差与智能体的数量呈线性关系,这对于大规模的多智能体系统来说是不可接受的。为了缓解这种情况,我们提出了一种简单的全分布式算法,该算法实现了相同的最优期望遗憾和,但将遗憾和的方差从0 (N)减少到$O(\min (N,K))$,并且不需要智能体之间的通信。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Control Systems Letters
IEEE Control Systems Letters Mathematics-Control and Optimization
CiteScore
4.40
自引率
13.30%
发文量
471
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信