{"title":"We Are Legion: High Probability Regret Bound in Adversarial Multiagent Online Learning","authors":"Sri Jaladi;Ilai Bistritz","doi":"10.1109/LCSYS.2024.3519637","DOIUrl":null,"url":null,"abstract":"We study a large-scale multiagent online learning problem where the number of agents N is significantly larger than the number of arms K. The agents face the same adversarial online learning problem with K arms over T rounds, where the adversary chooses the cost vectors <inline-formula> <tex-math>$\\boldsymbol {l}(1), \\ldots ,\\boldsymbol {l}(T)$ </tex-math></inline-formula> before the game begins. Each round t, each agent n picks an arm <inline-formula> <tex-math>$a_{n}$ </tex-math></inline-formula>(t) and incurs a cost of <inline-formula> <tex-math>$l_{a_{n}(t)}$ </tex-math></inline-formula> (t). Then, at the end of the round, all agents observe the costs of all arms <inline-formula> <tex-math>$l_{1}(t), \\ldots ,l_{K}(t)$ </tex-math></inline-formula>. The exponential weights algorithm achieves an order-wise optimal expected regret of <inline-formula> <tex-math>$O(\\sqrt {T})$ </tex-math></inline-formula> for each agent. However, the variance of the sum of regrets scales linearly with the number of agents, which is unacceptable for a large-scale multi-agent system. To mitigate this, we propose a simple fully distributed algorithm that achieves the same optimal expected sum of regrets but reduces the variance of the sum of regrets from O(N) to <inline-formula> <tex-math>$O(\\min (N,K))$ </tex-math></inline-formula> with no communication required between the agents.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"8 ","pages":"2985-2990"},"PeriodicalIF":2.4000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10806853/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
We study a large-scale multiagent online learning problem where the number of agents N is significantly larger than the number of arms K. The agents face the same adversarial online learning problem with K arms over T rounds, where the adversary chooses the cost vectors $\boldsymbol {l}(1), \ldots ,\boldsymbol {l}(T)$ before the game begins. Each round t, each agent n picks an arm $a_{n}$ (t) and incurs a cost of $l_{a_{n}(t)}$ (t). Then, at the end of the round, all agents observe the costs of all arms $l_{1}(t), \ldots ,l_{K}(t)$ . The exponential weights algorithm achieves an order-wise optimal expected regret of $O(\sqrt {T})$ for each agent. However, the variance of the sum of regrets scales linearly with the number of agents, which is unacceptable for a large-scale multi-agent system. To mitigate this, we propose a simple fully distributed algorithm that achieves the same optimal expected sum of regrets but reduces the variance of the sum of regrets from O(N) to $O(\min (N,K))$ with no communication required between the agents.