{"title":"稀疏超图上多代理汤普森采样的有限时间频数后悔约束","authors":"Tianyuan Jin, Hao-Lun Hsu, William Chang, Pan Xu","doi":"arxiv-2312.15549","DOIUrl":null,"url":null,"abstract":"We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents\nare factored into $\\rho$ overlapping groups. Each group represents a hyperedge,\nforming a hypergraph over the agents. At each round of interaction, the learner\npulls a joint arm (composed of individual arms for each agent) and receives a\nreward according to the hypergraph structure. Specifically, we assume there is\na local reward for each hyperedge, and the reward of the joint arm is the sum\nof these local rewards. Previous work introduced the multi-agent Thompson\nsampling (MATS) algorithm \\citep{verstraeten2020multiagent} and derived a\nBayesian regret bound. However, it remains an open problem how to derive a\nfrequentist regret bound for Thompson sampling in this multi-agent setting. To\naddress these issues, we propose an efficient variant of MATS, the\n$\\epsilon$-exploring Multi-Agent Thompson Sampling ($\\epsilon$-MATS) algorithm,\nwhich performs MATS exploration with probability $\\epsilon$ while adopts a\ngreedy policy otherwise. We prove that $\\epsilon$-MATS achieves a worst-case\nfrequentist regret bound that is sublinear in both the time horizon and the\nlocal arm size. We also derive a lower bound for this setting, which implies\nour frequentist regret upper bound is optimal up to constant and logarithm\nterms, when the hypergraph is sufficiently sparse. Thorough experiments on\nstandard MAMAB problems demonstrate the superior performance and the improved\ncomputational efficiency of $\\epsilon$-MATS compared with existing algorithms\nin the same setting.","PeriodicalId":501330,"journal":{"name":"arXiv - MATH - Statistics Theory","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs\",\"authors\":\"Tianyuan Jin, Hao-Lun Hsu, William Chang, Pan Xu\",\"doi\":\"arxiv-2312.15549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents\\nare factored into $\\\\rho$ overlapping groups. Each group represents a hyperedge,\\nforming a hypergraph over the agents. At each round of interaction, the learner\\npulls a joint arm (composed of individual arms for each agent) and receives a\\nreward according to the hypergraph structure. Specifically, we assume there is\\na local reward for each hyperedge, and the reward of the joint arm is the sum\\nof these local rewards. Previous work introduced the multi-agent Thompson\\nsampling (MATS) algorithm \\\\citep{verstraeten2020multiagent} and derived a\\nBayesian regret bound. However, it remains an open problem how to derive a\\nfrequentist regret bound for Thompson sampling in this multi-agent setting. To\\naddress these issues, we propose an efficient variant of MATS, the\\n$\\\\epsilon$-exploring Multi-Agent Thompson Sampling ($\\\\epsilon$-MATS) algorithm,\\nwhich performs MATS exploration with probability $\\\\epsilon$ while adopts a\\ngreedy policy otherwise. We prove that $\\\\epsilon$-MATS achieves a worst-case\\nfrequentist regret bound that is sublinear in both the time horizon and the\\nlocal arm size. We also derive a lower bound for this setting, which implies\\nour frequentist regret upper bound is optimal up to constant and logarithm\\nterms, when the hypergraph is sufficiently sparse. Thorough experiments on\\nstandard MAMAB problems demonstrate the superior performance and the improved\\ncomputational efficiency of $\\\\epsilon$-MATS compared with existing algorithms\\nin the same setting.\",\"PeriodicalId\":501330,\"journal\":{\"name\":\"arXiv - MATH - Statistics Theory\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2312.15549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.15549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs
We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents
are factored into $\rho$ overlapping groups. Each group represents a hyperedge,
forming a hypergraph over the agents. At each round of interaction, the learner
pulls a joint arm (composed of individual arms for each agent) and receives a
reward according to the hypergraph structure. Specifically, we assume there is
a local reward for each hyperedge, and the reward of the joint arm is the sum
of these local rewards. Previous work introduced the multi-agent Thompson
sampling (MATS) algorithm \citep{verstraeten2020multiagent} and derived a
Bayesian regret bound. However, it remains an open problem how to derive a
frequentist regret bound for Thompson sampling in this multi-agent setting. To
address these issues, we propose an efficient variant of MATS, the
$\epsilon$-exploring Multi-Agent Thompson Sampling ($\epsilon$-MATS) algorithm,
which performs MATS exploration with probability $\epsilon$ while adopts a
greedy policy otherwise. We prove that $\epsilon$-MATS achieves a worst-case
frequentist regret bound that is sublinear in both the time horizon and the
local arm size. We also derive a lower bound for this setting, which implies
our frequentist regret upper bound is optimal up to constant and logarithm
terms, when the hypergraph is sufficiently sparse. Thorough experiments on
standard MAMAB problems demonstrate the superior performance and the improved
computational efficiency of $\epsilon$-MATS compared with existing algorithms
in the same setting.